introduction to parallel and gpu computing · 1 intro 2 processors trends 3 multiprocessors 4 gpu...

49
Carlo Nardone Sr. Solution Architect, NVIDIA EMEA INTRODUCTION TO PARALLEL AND GPU COMPUTING

Upload: others

Post on 18-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

Carlo Nardone

Sr. Solution Architect, NVIDIA EMEA

INTRODUCTION TO

PARALLEL AND GPU COMPUTING

Page 2: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

2

1 Intro

2 Processors Trends

3 Multiprocessors

4 GPU Computing

AGENDA

Page 3: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

3

PARALLEL COMPUTING, ILLUSTRATED

Page 4: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

4

FROM MULTICORE

TO MANYCORE

Page 5: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

5

PARALLEL COMPUTING IS OLD

Source: Tim Mattson

Page 6: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

6

1 Intro

2 Processors Trends

3 Multiprocessors

4 GPU Computing

AGENDA

Page 7: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

7

MOORE’S LAW

Source: U.Delaware CISC879 / J.Dongarra

Page 8: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

8

MICROPROCESSOR FREQUENCY TREND

source: CA5 Hennessy/Patterson

Page 9: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

9

THE POWER WALL

Page 10: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

11

MICROPROCESSORS PERFORMANCE TREND

source: CA5 Hennessy/Patterson

Page 11: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

13

THE PROCESSOR-MEMORY GAP

source: CA5 Hennessy & Patterson

Page 12: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

14

ARITHMETIC INTENSITY

Arithmetic intensity, specified as the number of floating-point operations to run the program divided by the number of bytes accessed in main memory [Williams et al. 2009]. Some kernels have an arithmetic intensity that scales with problem size, such as dense matrix, but there are many kernels with arithmetic intensities independent of problem size. Source: CA5 Hennessy & Patterson

Page 13: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

15

BANDWIDTH AND THE ROOFLINE MODEL

Source: HPCA5

Page 14: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

16

1 Intro

2 Processors Trends

3 Multiprocessors

4 GPU Computing

AGENDA

Page 15: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

17

FLYNN’S TAXONOMY: SISD

source: CA5 Hennessy & Patterson

Page 16: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

18

FLYNN’S TAXONOMY: SIMD

source: CA5 Hennessy & Patterson

Page 17: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

19

FLYNN’S TAXONOMY: SIMD (2)

source: CA5 Hennessy & Patterson

Page 18: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

20

FLYNN’S TAXONOMY

source: CA5 Hennessy & Patterson

Page 19: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

21

FLYNN’S TAXONOMY: MISD (?)

source: CA5 Hennessy & Patterson

Page 20: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

22

FLYNN’S TAXONOMY: MIMD

source: CA5 Hennessy & Patterson

Page 21: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

23

MULTIPROCESSORS

source: CA5 Hennessy & Patterson

Page 22: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

24

MULTIPROCESSOR SMP CLASS

source: CA4 Hennessy & Patterson

Page 23: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

25

MULTIPROCESSOR DISTRIBUTED MEM CLASS

source: CA4 Hennessy & Patterson

Page 24: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

26

X86 ARCHITECTURE Intel Nehalem

Page 25: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

27

1 Intro

2 Processors Trends

3 Multiprocessors

4 GPU Computing

AGENDA

Page 26: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

28

Page 27: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

29

LOW LATENCY OR HIGH THROUGHPUT?

CPU

Optimised for low-latency

access to cached data sets

Control logic for out-of-order

and speculative execution

GPU

Optimised for data-parallel,

throughput computation

Architecture tolerant of

memory latency

More transistors dedicated to

computation

Page 28: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

30

Page 29: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

31

GPU

UNIFIED DESIGN

Page 30: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

32

GPGPU COMPUTING, B.C. ERA

Page 31: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

33

WHY GPU ARE ATTRACTIVE FOR HPC?

• Massive multithreaded manycore chips

• High flops count (SP and later DP)

• High memory bandwidth, (later) including ECC

• Lots of programming languages

• Lots of programming tools

• Widely used in Computational Physics

Page 32: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

34

GPGPU COMPUTING, A.C. ERA

(Fermi)

Page 33: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

35

GROWTH IN GPU COMPUTING

2015 2008

3 Million CUDA Downloads

150,000 CUDA Downloads

60,000 Academic Papers

4,000

Academic

Papers

800 Universities Teaching

60

Universities

Teaching

54,000 Supercomputing

Teraflops

77 Supercomputing

Teraflops

450,000

Tesla GPUs 6,000

Tesla GPUs

319 CUDA Apps

27 CUDA Apps

Page 34: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

36

GROWTH IN GPU COMPUTING

2015 2008

3 Million CUDA Downloads

150,000 CUDA Downloads

60,000 Academic Papers

4,000

Academic

Papers

800 Universities Teaching

60

Universities

Teaching

54,000 Supercomputing

Teraflops

77 Supercomputing

Teraflops

450,000

Tesla GPUs 6,000

Tesla GPUs

319 CUDA Apps

27 CUDA Apps

Page 35: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

37

ACCELERATED, HETEROGENEOUS COMPUTING

CPU Optimized for Serial Tasks

GPU Accelerator Optimized for Parallel Tasks

10x Performance & 5x Energy Efficiency for HPC

Page 36: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

38

GEFORCE 8: 1ST MODERN GPU ARCH

Page 37: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

40

NVIDIA KEPLER

GK110 PROCESSOR

Page 38: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

41

Page 39: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

42

KEPLER SMX DIAGRAM

Page 40: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

43

146X

Medical Imaging

U of Utah

36X

Molecular Dynamics

U of Illinois, Urbana

18X

Video Transcoding

Elemental Tech

50X

Matlab Computing

AccelerEyes

100X

Astrophysics

RIKEN

149X

Financial Simulation

Oxford

47X

Linear Algebra

Universidad Jaime

20X

3D Ultrasound

Techniscan

130X

Quantum Chemistry

U of Illinois, Urbana

30X

Gene Sequencing

U of Maryland

GPUs Accelerate Science

Page 41: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

44

Page 42: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

45

SUPERCOMPUTING ERA

Page 43: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

46

Page 44: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

47

Development Platform for Embedded

Computer Vision, Robotics, Medical

Tegra K1 SoC

Quad core A15 + Kepler GPU

192 CUDA Enabled cores

326 Gflops @ 5 Watt

$192

JETSON TK1

THE WORLD’S 1st EMBEDDED SUPERCOMPUTER

Page 45: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

48

JETSON TK1: UNLOCKING NEW

APPLICATIONS

Computer Vision

Robotics

Automotive

Medicine

Avionics

Page 46: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

49

US TO BUILD TWO FLAGSHIP SUPERCOMPUTERS

IBM POWER9 CPU + NVIDIA Volta GPU

NVLink High Speed Interconnect

>40 TFLOPS per Node, >3,400 Nodes

2017

SUMMIT SIERRA 150-300 PFLOPS

Peak Performance > 100 PFLOPS

Peak Performance

Major Step Forward on the Path to Exascale

Page 47: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

50

SG

EM

M /

W

2012 2014 2008 2010 2016

48

36

12

0

24

60

2018

72

Tesla Fermi

Kepler

Maxwell

Pascal Mixed Precision 3D Memory NVLink

Volta

GPU ROADMAP Pascal 2x SGEMM/W

Page 48: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

51

PASCAL GPU FEATURING NVLINK AND STACKED MEMORY

NVLINK GPU high speed interconnect

80-200 GB/s

3D Stacked Memory 4x Higher Bandwidth (~1 TB/s)

3x Larger Capacity

4x More Energy Efficient per bit

Page 49: INTRODUCTION TO PARALLEL AND GPU COMPUTING · 1 Intro 2 Processors Trends 3 Multiprocessors 4 GPU Computing AGENDA . 3 ... •Lots of programming languages •Lots of programming

52

THANK YOU

[email protected]

+39 335 5828197