cuda cloud: enabling hpc workloads in openstack |...

21
CUDA in the Cloud – Enabling HPC Workloads in OpenStack John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute [email protected] With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

Upload: ngothu

Post on 03-May-2018

240 views

Category:

Documents


5 download

TRANSCRIPT

CUDA in the Cloud – Enabling HPC Workloads in OpenStack

John  Paul  Walters  Computer  Scien5st,  USC  Informa5on  Sciences  Ins5tute  

[email protected]  With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

2

Outline

§  Introduction §  OpenStack Background §  Supporting Heterogeneity §  Supporting GPUs §  Performance Results §  Current Status §  Future Work

3

Introduction

§  Cloud computing is traditionally seen as a resource for IT –  Web servers, databases

§  More recently researchers have begun to leverage the public cloud as an HPC resource –  AWS virtual cluster is 102 on Top500 list

§  Major difference between HPC and IT in the cloud: –  Types of resources, heterogeneity

§  Our contribution: we’re developing the heterogeneous HPC extensions for the OpenStack cloud computing platform

Why A Heterogeneous Cloud?

§  Typical  cloud  infrastructures  focus  on  commodity  hardware.  §  Choose  number  of  CPUs,  memory,  and  disk  §  ODen  not  a  good  match  for  technical  compu5ng  

§  Advantage  of  Heterogeneity  §  Leverage  GPUs  and  other  accelerators  §  Power  efficiency,  performance  §  Process  technical  workloads  

§  Move  away  from  batch/grid  schedulers  §  Give  users  access  to  customizable  instances  §  Dedicated  virtual  machine    

Benefits of Heterogeneity

136.2 seconds 139.5 seconds

SGI  UV100  rendering  1926  objects  

Tilera  vs.  x86  video  transcoding  

Each  architecture  performs  well  on  different  applica5ons.  

CPU: 108 samples GPU: 1010 samples

6

OpenStack Background §  OpenStack founded by

Rackspace and NASA §  In use by Rackspace, HP, and

others for their public clouds §  Open source with hundreds of

participating companies §  In use for both public and private

clouds §  Current stable release:

OpenStack Folsom –  OpenStack Grizzly to be

released in April

0  

20  

40  

60  

80  

100  

120  

Google  Trends  Searches  for  Common  Open  Source  IaaS  Projects  openstack  cloudstack  opennebula  eucalyptus  cloud  

7

OpenStack Architecture

Image Source: Ken Pepple (http://ken.pepple.info/openstack/2012/09/25/openstack-folsom-architecture/)

Supporting Heterogeneity

•  Current  cloud  services  have  limited  resource  op5ons  •  M1.large  =  7.5  GB  RAM,  2  virtual  cores,  850  GB  disk  •  x86  and  x86_64  only  

•  Users  should  be  able  to  customize  instance  types  to  needs  •  E.g.  16  GB  RAM,  4  virtual  cores,  SSE4.2,  2xKepler  GPUs  •  Hypervisor  type  

9

Supporting GPUs §  We’re pursuing two approaches for GPU support in OpenStack

–  LXC support for container-based VMs •  Good for non-virtualization-friendly GPUs •  Near-native performance •  Can’t run non-Linux guests

–  Xen support for fully virtualized guests •  Fully virtualized guests, can run Windows, etc. •  Some overhead, especially for PCIe transfers

§  Our OpenStack work currently supports GPU-enabled LXC containers –  Xen is in development, preliminary results shown today

9

10

Integrating GPU Support into OpenStack §  Our GPU work is based off of the OpenStack’s libvirt

module –  Libvirt supports KVM/QEMU, LXC –  Libvirt-based Xen support is experimental

§  After instances boot, LXC’s GPUs are whitelisted and their major/minor device IDs are passed into the VM

§  GPU Instances are launched just like any other: Euca-run-instances –k <my key> -t cg1.small <machine image>

cg1 instance types represent GPUs

11

Test Results: LXC and Xen §  3 data sets (all using CUDA 5)

–  NVIDIA CUDA samples –  SHOC multi-GPU –  BFS/Graph500 multi-GPU

LXC tests §  2x Xeon E5520 §  96 GB RAM §  1x Tesla S2050 (4 GPUs) §  RHEL 6.1

–  2.6.38.2 kernel (non-stock)

Xen Tests §  2x Xeon X5660 §  192 GB RAM §  2x NVIDIA Tesla C2075 §  Centos 6.3 (with Xen 4.1.2)

–  2.6.32-279 kernel (guest, stock)

Hardware

12

LXC CUDA 5 Samples

0.92  

0.94  

0.96  

0.98  

1  

1.02  

RelaHv

e  Pe

rforman

ce  

Higher  is  BeKer  

LXC  CUDA  Samples,  Performance  RelaHve  to  Base  

13

LXC Bandwidth Results

0  

1000  

2000  

3000  

4000  

5000  

Band

width,  M

B/s  

Data  Size,  KB  

LXC  vs.  Base,  Host  to  Device  Bandwidth,  Pinned  

Base  

LXC  VM  0  

500  1000  1500  2000  2500  3000  3500  

Band

width,  M

B/s  

Data  Size,  KB  

LXC  vs.  Base,  Device  to  Host  Bandwidth,  Pinned  

Base  

LXC  VM  

LXC mirrors base bandwidth

14

LXC SHOC Results

0.86  0.88  0.9  

0.92  0.94  0.96  0.98  

1  1.02  

RelaHv

e  Pe

rforman

ce  

Higher  is  BeKer  

LXC  SHOC  Benchmarks,  1  Node  4  GPUs.    Performance  RelaHve  to  Base  

15

0.0E+00  

5.0E+06  

1.0E+07  

1.5E+07  

2.0E+07  

2.5E+07  

3.0E+07  

3.5E+07  

16   17   18   19   20   21   22  

Med

ian  TEPS  

Problem  Size  

BFS/Graph500  2,4  GPUs,  1  Node  

LXC  2  GPUs  

Base  2  GPUs  

LXC  4  GPUs  

Base  4  GPUs  

LXC Graph500 Results

TEPS: Traversed Edges per Second

16

Xen CUDA 5 Samples

0.92  0.94  0.96  0.98  

1  1.02  

RelaHv

e  Pe

rforman

ce  

Higher  is  BeKer  

Xen  CUDA  Samples,  Performance  RelaHve  to  Base  

17

Xen Bandwidth Results

0  1000  2000  3000  4000  5000  6000  7000  

Band

width,  M

B/s  

Data  Size,  KB  

Xen  vs.  Base,  Host  to  Device  Bandwidth,  Pinned  

Base  

Xen  VM  

0  1000  2000  3000  4000  5000  6000  

Band

width,  M

B/s  

Data  Size,  KB  

Xen  vs.  Base,  Device  to  Host  Bandwidth,  Pinned  

Base  

Xen  VM  

18

Xen SHOC Benchmarks

0.86  0.88  0.9  0.92  0.94  0.96  0.98  

1  1.02  

RelaHv

e  Pe

rforman

ce  

Higher  is  BeKer  

Xen  SHOC  Benchmarks,  1  Node    2  GPUs.    Performance  RelaHve  to  Base  

19

Xen Graph500 Results

0.00E+00  

5.00E+06  

1.00E+07  

1.50E+07  

2.00E+07  

2.50E+07  

3.00E+07  

3.50E+07  

16   17   18   19   20   21   22  

Med

ian  TEPS  

Problem  Size  

BFS/Graph500,  2  GPUs,  1  Node  

Xen  VM  

Base  

TEPS: Traversed Edges per Second

20

Current Status

§  Source code is available now –  https://github.com/usc-isi/nova

§  Includes support for heterogeneity –  GPU-enabled LXC instances –  Bare-metal provisioning –  Architecture-aware scheduler

§  We’re working towards integrating additional support into the OpenStack Grizzly and H-releases

21

Future Work

§  Add support for high speed networking as a resource –  Infiniband via SR-IOV –  Enable features like GPUdirect

§  Add support for the Kepler architecture §  Merge the Xen support work into Upstream OpenStack