gpu-accelerated signal processing in openstack · openstack background ! openstack founded by...
Post on 10-Oct-2020
12 Views
Preview:
TRANSCRIPT
GPU Accelerated Signal Processing in OpenStack
John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute
jwalters@isi.edu
2
Outline
§ Motivation § OpenStack Background § Heterogeneous OpenStack § GPU Performance across hypervisors § Current Status § Future work
3
Motivation
§ Scientific workloads demand increasing performance with greater power efficiency – Architectures have been driven towards specialization,
heterogeneity § Infrastructure-as-a-Service (IaaS) clouds can democratize
access to the latest, most powerful accelerators – Then why are most of today’s clouds homogeneous? – Of the major providers, only Amazon offers virtual machine
access to GPUs in the public cloud
4
Cloud Computing and GPUs
§ GPU passthrough has historically been hard – Specific to particular GPUs, hypervisors, host OS – Legacy VGA BIOS support, etc.
§ Today we can access GPUs through most of the major hypervisors – KVM, VMWare ESXi, Xen, LXC
§ Combine this with a heterogeneous cloud
5
OpenStack Background § OpenStack founded by Rackspace
and NASA § In use by Rackspace, HP, and
others for their public clouds § Open source with hundreds of
participating companies § In use for both public and private
clouds § Current stable release: OpenStack
Havana – OpenStack Icehouse to be
released in April
0
20
40
60
80
100
120
Google Trends Searches for Common Open Source IaaS Projects openstack cloudstack opennebula eucalyptus cloud
6
OpenStack Architecture
Image source: http://docs.openstack.org/training-guides/content/module001-ch004-openstack-architecture.html
7
Supporting GPUs § We’re pursuing multiple approaches for GPU support in
OpenStack – LXC support for container-based VMs – Xen support for fully virtualized guests – KVM support for fully virtualized guests, SR-IOV
§ Also compare against VMWare ESXi § Our OpenStack work currently supports GPU-enabled LXC
containers – Xen prototype implementation as well
§ Given widespread support for GPUs across hypervisors, does hypervisor choice impact performance?
7
8
GPU Performance Across Hypervisors
§ 2 CPU Architectures, 2 GPU Architectures, 4 Hypervisors – Sandy Bridge + Kepler, Westmere + Fermi – KVM, Xen, LXC, VMWare ESXi
§ Standardize on a common CentOS 6.4 base system for comparison – Same 2.6.32-358.23.2 kernel across all guests and LXC
9
Hardware Setup
Sandy Bridge + Kepler Westmere + Fermi
CPU (cores) 2xE5-‐2670 (16) 2xX5660 (12)
Clock Speed 2.6 GHz 2.6 GHz
RAM 48 GB 192 GB
NUMA Nodes 2 2
GPU 1xK20m 2xC2075
10
Hypervisor Configuration
Hypervisor Linux Kernel Linux Distro
KVM 3.12 Arch 2013.10.01
Xen 4.3.0-‐7 3.12 (dom0) Arch 2013.10.01
VMWare ESXi 5.5.0 N/A N/A
LXC 2.6.32-‐358.23.2 CentOS 6.4
11
Benchmarks § 3 Benchmarks
– SHOC OpenCL: signal processing – GPU-LIBSVM: big data, machine learning – HOOMD: Molecular dynamics, GPUDirect
§ Virtual machines: CentOS 6.4 with 2.6.32-358.23.2 kernel, 20 GB RAM, and 1 CPU socket – Control for NUMA effects
12
K20 Results - SHOC
0.96 0.97 0.98 0.99
1 1.01
RelaGv
e Pe
rforman
ce
SHOC Performance for Common Signal Processing Kernels
KVM
Xen
LXC
VMWare
13
K20 Results – SHOC Outliers
0.94 0.96 0.98
1 1.02 1.04 1.06
RelaGv
e Pe
rforman
ce
SHOC OpenCL Level 1, Level 2 Outliers
KVM Xen LXC VMWare
14
C2075 Results - SHOC
0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98
1 1.02
RelaGv
e Pe
rforman
ce
SHOC Performance for Common Signal Processing Kernels
KVM
Xen
LXC
VMWare
15
C2075 Results – SHOC Outliers
0.6 0.7 0.8 0.9 1
1.1
RelaGv
e Pe
rforman
ce
SHOC OpenCL Level 1, Level 2 Outliers
KVM Xen LXC VMWare
16
SHOC Observations § Overall both Fermi and Kepler systems perform near-
native – This is especially true for KVM and LXC
§ Xen on the C2075 system shows some overhead – Likely because Xen couldn’t activate large page tables
§ Some unexpected performance improvement for Kepler Spmv
17
K20 Results – GPU-LIBSVM
0.88 0.9
0.92 0.94 0.96 0.98
1 1.02
1800 3600 4800 6000
RelaGv
e Pe
rforman
ce
# of training instances
GPU-‐LIBSVM RelaGve Performance
KVM
Xen
LXC
VMWare
18
C2075 Results – GPU-LIBSVM
0 0.2 0.4 0.6 0.8 1
1.2 1.4
1800 3600 4800 6000
RelaGv
e Pe
rforman
ce
# of training instances
GPU-‐LIBSVM RelaGve Performance
KVM
Xen
LXC
VMWare
19
GPU-LIBSVM Observations § Unexpected performance improvement for KVM on both
systems – Most pronounced on Westmere/Fermi platform
§ This is due to the use of transparent hugepages (THP) – Back the entire guest memory with hugepages – Improves TLB performance
§ Disabling hugepages on Westmere/Fermi platform reduces performance to 80-87% of the base system
20
Multi-GPU with GPUDirect § Many real applications extend beyond a single node’s
capabilities § Test multi-node performance with Infiniband SR-IOV and
GPUDirect § 2 Sandy Bridge nodes equipped with K20 GPUs
– ConnectX-3 IB with SR-IOV enabled – Ported Mellanox OFED 2.1-1 to 3.13 kernel – KVM hypervisor
§ Test with HOOMD, a commonly used GPUDirect-enabled particle dynamics simulator
21
GPUDirect Advantage
Image source: http://old.mellanox.com/content/pages.php?pg=products_dyn&product_family=116
22
HOOMD Performance
0.92
0.94
0.96
0.98
1
1.02
16k 32k 64k 128k 256k 512k RelaGv
e Pe
rforman
ce
Number of ParGcles
HOOMD Lennard-‐Jones Liquid MD RelaGve Performance
23
Current Status
§ Source code is available now – https://github.com/usc-isi/nova
§ Includes support for heterogeneity – GPU-enabled LXC instances – Bare-metal provisioning – Architecture-aware scheduler – Prototype Xen with GPU passthrough implementation
24
Future Work § Primary focus: multi-node
– Greater range of applications, larger systems § Integrate GPU passthrough support for KVM
– This might come free with the existing OpenStack PCI passthrough work
§ NUMA support – This work assumes perfect NUMA mapping – OpenStack should be NUMA-aware
top related