hpcc mid-morning break high performance computing on a gpu cluster dirk colbry, ph.d. research...

12
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

Upload: stephany-parker

Post on 16-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

HPCC Mid-Morning Break

High Performance Computing on a GPU cluster

Dirk Colbry, Ph.D.

Research Specialist

Institute for Cyber Enabled Discovery

Page 2: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

What is a GPU?

• Graphics Processing Unit• Originally designed to make

Video Games• Uses many processing cores to

parallelize the math required for real time game play.

• Early researchers made general programs that looked like graphics so they could run in the GPU.

• In 2006 nVidia released the CUDA programming interface to allow users to easily make scalable general purpose programs that run on the GPU (GPGPU).

Page 3: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

GPU vs CPU

Page 4: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

CPU and GPU working together

Page 5: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

Running on the GPU

• Program Starts on the CPU Copy data to GPU (slow-ish) Run kernel threads on GPU (very fast) Copy results back to CPU (slow-ish)

• There are a lot of clever ways to fully utilize both the GPU and CPU.

Page 6: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

Pros and Cons

• Benefits Lots of processing

cores. Works with the CPU

as a co-processor Very fast local

memory bandwidth Large online

community of developers

• Drawbacks Can be difficult to

program. Memory Transfers

between GPU and CPU are costly (time).

Cores typically run the same code.

Page 7: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

gfx-000 Test hardware

• Single Quad core 2.4 Ghz Intel Processor.

• 8GB of CPU RAM• Three Nvidia GTX 280 Video cards:

1GB of ram per card 240 CUDA processing Cores per card 1.3 GHz Processor Clock Speed

• Total of 724 cores on a single machine

Page 8: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

Installed Software on gfx-000

• Cuda toolkit 2.2 and 2.3 For programming in c/c++ and fortran

• cublas – Cuda version of blas libraries• cufft – Cuda version of fft libraries• pycuda – Python Cuda Interface• Zephyr – Molecular Dynamics Program

optimized for GPUs

Page 9: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

Other Available Software

• OpenCL c/c++ interface

• Jacket Matlab GPU wrapper

• Lattice Boltzmann pde solver

• OpenVIDIA Machine Vision

• Many Many others

• Cuda Zone ~90 thousand cuda

developers. Lots of software

examples Developer Forms Tutorials

• http://www.nvidia.com/object/cuda_home.html

Page 10: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

New GPU Cluster Buy-In

• Rack Units: 1U• CPU: 2x Intel Xeon E5530 Quad-Core 2.40GHZ• Memory: 18GB of Ram• Hard drive: 250GB disk for OS and Local

Scratch• Network: Ethernet only, (no Infiniband support)• GPU: Two Nvidia Tesla M1060 GPUs• Support: Four year, next business day hardware

support• Cost: $5,224

Page 11: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

Each Nvidia Tesla M1060

• Number of Streaming Processor Cores 240• Frequency of processor cores 1.3 GHz• Single Precision peak floating point performance 933 gigaflops• Double Precision peak floating point performance 78 gigaflops• Dedicated Memory 4 GB GDDR3• Memory Speed 800 MHz• Memory Interface 512-bit • Memory Bandwidth 102 GB/sec• System Interface PCIe

Page 12: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery

What are we buying

• 240 cores * 2 GPUs + 4 cores * 2 CPUs = 488 Cores / node

• 31 Nodes (minimum) * 488 Cores / node = 15,128 cores in our new cluster

• However, 20 of these nodes are dedicated buy-in nodes so only 5368 cores will be available in the general cluster