the top six advantages of cuda-ready...

Post on 30-Mar-2018

219 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Top Six Advantages of

CUDA-Ready Clusters

Ian Lumb

Bright Evangelist

GTC Express Webinar

January 21, 2015

2

“We scientists are time-constrained,” said Dr. Yamanaka. “Our priority is our research, not managing our clusters. Bright [Cluster Manager] is intuitive to use, and with it I can effectively manage my cluster without wasting time writing scripts, or synchronizing management tool revisions. Provisioning is fast and easy too. I prefer this approach over open source toolkits.”

http://www.brightcomputing.com/News-Tokyo-Institute-of-Technology-Gordon-Bell-Prize-Winner-Uses-Bright-Cluster-Manager-to-Develop-Applications-for-One-of-the-Worlds-Fastest-Supercomputers

CUDA-Ready Clusters

1. You focus on coding – not infrastructure & toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

CUDA-Ready Clusters

1. You focus on coding – not infrastructure &

toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

Cluster Management Shell

Bright Cluster Manager — CUDA Environment

User PortalCluster Management GUI

SSL / SOAP / X509 / IPtables

Cluster Management Daemon

Dis

k

Eth

ern

et

Inte

rcon

ne

ct

IPM

I / iL

O

PD

U

CP

U

GP

Us

Me

mo

ry

Slurm

PBS Pro

Torque/Maui

Torque/MOAB

Grid Engine

LSF

Monitoring

Automation

Health Checks

Management

Compilers

Libraries

Debuggers

Profilers

Provisioning

SLES / RHEL / CentOS / SL

6

Unified Memory

http://info.brightcomputing.com/Blog/bid/196783/Bright-Cluster-Manager-Integrates-Support-for-CUDA-6

7

8

9

10NVIDIA GPU Boost

11

Mo

der

niz

ed m

on

ito

rin

g fo

r H

PC

clu

ster

sh

ttp

://i

nsi

de

hp

c.co

m/2

01

4/1

1/m

on

ito

rin

g-h

pc-

clu

ster

s-m

od

ern

ized

/

Cluster Health Management

Provide problem free environment for running jobs

Four elements

1. Cluster management automation

2. Regular health checks

3. Pre-job health checks

4. Hardware stability & performance tests

All elements above are configurable and extensible

CUDA-Ready Clusters

1. You focus on coding – not infrastructure & toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

Syncing with GPUs + CUDA …

Innovation characterizes the entire history and

evolution of GPU programmability through CUDA

• BUT … introduces challenges and opportunities …

Bright Computing’s approach leverages

• People

• Proactively maintaining business and technical relationships

• Process

• `Hands-on engineering’ begins with release candidates

– Preliminary to fully productized implementations

• Product

• Bright Cluster Manager released once twice per year

– Updates flow continuously …

http://info.brightcomputing.com/blog/cuda-6.5-something-for-nothing

http://www.brightcomputing.com/News-Bright-Cluster-Manager-Adds-Support-for-the-NVIDIA-Tesla-K80-Dual-GPU-Accelerator

CUDA-Ready Clusters

1. You focus on coding – not infrastructure & toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

16

Available Versions of the CUDA Toolkit

17

Using CUDA 6.0

CUDA-Ready Clusters

1. You focus on coding – not infrastructure & toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

HPC Development Environment

Compilers (GNU, Intel*, AMD, Portland*, etc.)

Debuggers and profilers (GNU, TAU, Allinea,

TotalView)

MPI libraries (OpenMPI, MPICH, MPICH-MX,

MVAPICH)

Other libraries (threading libraries, OpenMP, Global

Arrays, HDF5, IIPP, TBB, NetCDF, PETSc, etc.)

Mathematical libraries (ACML, MKL*, FFTW, GMP,

GotoBLAS, ScaLAPACK, etc.)

Environment modules

Programming GPUs

CUDA

OpenCL

OpenACC

MPI

Tools

• CUDA gdb

• nvidia-smi

• CUDA Utility Library

• Examples

• 3rd Party

• Allinea

• Rogue Wave

CUDA Development Environment

CUDA-Ready Clusters

1. You focus on coding – not infrastructure & toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

HPC and Hadoop

Use GPUs for HPC and Big Data Analytics

Introduce GPUs into Hadoop clusters

Make use of Hadoop services

25

26

CUDA-Ready Clusters

1. You focus on coding – not infrastructure & toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

GPUs in the Cloud? The Top Four Reasons

1. You can realize possibilities using the cloud

• You can scale up and scale out

2. You still realize the promise of GPU programmability

• … via HPC in the cloud

3. Your use of the cloud is transparent

• You’ve found ways to `hide’ latency

• Constraints apply for MPI apps

4. Your go-to apps still work in the cloud

http://info.brightcomputing.com/Blog/bid/196290/The-Top-4-Reasons-You-Should-Try-Cloud-Based-GPUs-for-HPC

Scenario I — “Cluster on Demand”

node001

head nodenode002

node003

Cloud Utilization

Scenario II — “Cluster Extension”

head node

node001 node002 node003

node004

node005

node006

node007

Cloud Utilization

31

CUDA-Ready Clusters

1. You focus on coding – not infrastructure & toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

Case Study: TUAT (1)

The Customer

• Engages materials-science research

• Compares computational models with physical experiments

• High-resolution, 3D phase field modeling at large scales

using GPUs

The Challenge

• Make available the latest innovations in GPU technology

without distracting focus from research

Case Study: TUAT (2)

The Solution

• Laboratory GPU cluster designed and implemented by

HPCTech Corp.

• Bright Cluster Manager deployed by HPCTech

• Use Bright to fully manage the entire CUDA environment –

including regular updates

• Use modules environment via Bright to manage multiple CUDA

environments

• Prototype simulations using laboratory HPC cluster

• Includes debugging and tuning code

• Execute large-scale simulations using TSUBAME

• The Results …

Calculation steps : 25000 150000 275000

51μm

0.01 0.38 [wt.%]

Caption: Snapshots of austenite-to-ferrite transformation behavior in Fe-C alloy simulated by

a multi-phase-field method. Upper and lower panels show time evolution of ferrite grains and

carbon concentration during the phase transformation. The simulation was performed on 512

× 512 × 256 computational grids using 8 GPUs in lab cluster. (Prof. A. Yamanaka, TUAT)

0

1

2

3

4

5

128 256 512

Ela

pse

d t

ime

[×10

00

s]

Number of GPUs

Caption: Performance of multiple-GPU computation of multi-phase-field simulation of

austenite-to-ferrite transformation in Fe-C alloy. The performance was measured by

performing the simulations on TSUBAME2.5 supercomputer of Tokyo Institute of

Technology. The number of computational grids, crystal grains and calculation steps were

5123, 4068 and 105, respectively. (Prof. A. Yamanaka, TUAT, priv. comm.)

http://www.tuat.ac.jp/~yamanaka/

37

Case Study: TUAT (3)

“We scientists are time-constrained,” said Dr. Yamanaka. “Our priority is our research, not managing our clusters. Bright is intuitive to use, and with it I can effectively manage my cluster without wasting time writing scripts, or synchronizing management tool revisions. Provisioning is fast and easy too. I prefer this approach over open source toolkits.”

CUDA-Ready Clusters

1. You focus on coding – not infrastructure & toolchains

2. You’re always in sync – with GPUs + CUDA

3. You cross-develop with confidence and ease

• Maintaining and using highly customized environments

4. You choose and combine in programming GPUs

• CUDA or OpenCL or OpenACC

• … and combine with MPI

5. You have converged HPC + Big Data Analytics

• You have access to Hadoop alongside HPC

6. You seamlessly utilize ‘The Cloud’

• You extend into AWS, deploy OpenStack, …

CUDA-ready clusters are GPU developer-ready

Q & A

Ian Lumb, ian.lumb@brightcomputing.com

http://www.brightcomputing.com/

Additional Slides

42

Cluster Health Management

Goal: provide problem free environment for running jobs

Four elements1. Cluster management automation

2. Regular health checks• Actions that return PASS, FAIL or UNKNOWN

• Can be associated with a settable severity and a message

• Can launch an action based on any response value

3. Pre-job health checks• Let the workload manager hold the job very briefly

• Check the health of each reserved node

• If unhealthy, take the node offline, inform the system administrator

• Let the workload manager reschedule the job to a different set of nodes

4. Hardware stability & performance tests• Very wide range of tests

• May include disk overwrites and reboot(s)

All elements above are configurable and extensible

44

Bright API

top related