cuda

CUDA

Presented By:

S7 CS

Guided By:

04/10/2023 2

Contents

WHAT IS CUDA ??????1

EXECUTION MODEL2

IMPLEMENTATION3

APPLICATION4

Seminar ‘11 CUDA

04/10/2023 3

What is CUDA ??????

CUDA – Compute Unified Device Architecture Hardware and software architecture

For computing on the GPU

Developed by Nvidia in 2007

GPU

Do massive amount of task simultaneously and quickly by

using several ALUs

ALUs are programmable by Graphics API

Seminar ‘11 CUDA

04/10/2023 4

What is CUDA ??????

Using CUDA – No need to map GPU towards Graphics APIs

CUDA provides number crunching very fast

CUDA is well suited for highly parallel algorithms and

large datasets

Consists of heterogeneous programming model and software environment

Hardware and software models

An Extension of C programming

Designed to enable heterogeneous computation

Computation with CPU &GPU

Seminar ‘11 CUDA

04/10/2023 5

CUDA kernels & threads Device = GPU

Executes parallel portions of an application as kernels

Host = CPU

Executes serial portions of an application

Kernel = Functions that runs on device

One kernel at one time

Many threads execute each kernel

Posses host and device memory

Host and device connected by PCI EXPRESS X16

Seminar ‘11 CUDA

04/10/2023 6

Arrays parallel threads

A CUDA kernel is executed by an array of threads

All threads run the same code

Each thread has ID uses to compute memory addresses

Seminar ‘11 CUDA

04/10/2023 7

Thread batching Thread cooperation is valuable

Share results to avoid redundant computation

Share memory accesses

Thread block = Group of threads

Threads cooperate together using shared memory and

synchronization

Thread ID is calculated by

x+yDx (for 2 dimensional block)

(x,y) – thread index

(Dx,Dy) – block size

Seminar ‘11 CUDA

04/10/2023 8

Thread Batching (Contd…) (x+yDx+zDxDy) (for 3 dimensional block)

(x,y,z) – thread index

(Dx,Dy,Dz) – block size

Grid = Group of thread blocks

Seminar ‘11 CUDA

04/10/2023 9

Thread Batching (Contd…) There is block ID

• Calculated as thread ID

Threads in different blocks cannot cooperate

Seminar ‘11 CUDA

04/10/2023 10

Transparent Scalability

Hardware is free to schedule thread blocks on any processor

A kernel scales across parallel multiprocessors

Seminar ‘11 CUDA

04/10/2023 11

CUDA architectures

Architecture’s Codename G80 GT200 Fermi

Release Year 2006 2008 2010

Number of Transistors 681 million 1.4 billion 3.0 billion

Streaming Multiprocessors (SM)

16 30 16

Streaming Processors (per SM)

8 8 32

Streaming Processors (total) 128 240 512

Shared Memory (per SM) 16 KB 16 KBConfigurable 48

KB or 16 KB

L1 Cache (per SM) None NoneConfigurable 16

KB or 48 KB

Seminar ‘11 CUDA

04/10/2023 12

8 & 10 Series Architecture

G80

GT200

Seminar ‘11 CUDA

04/10/2023 13

Kernel memory access

Per thread

Seminar ‘11 CUDA

Thread

Block

Per block

Per device

04/10/2023 14

Physical Memory Layout

“Local” memory resides in device DRAM

Use registers and shared memory to minimize local memory use

Host can read and write global memory but not shared memory

Seminar ‘11 CUDA

04/10/2023 15

Execution Model

Threads are executed by thread processors

Thread blocks are executed by multiprocessors

A kernel is launched as a grid of thread blocks

Seminar ‘11 CUDA

04/10/2023 16

CUDA software developmentSeminar ‘11 CUDA

04/10/2023 17

Compiling CUDA code

CUDA nvcc compiler to

compile the .cu files which

divides code into NVidia

assembly and C++ code.

Seminar ‘11 CUDA

04/10/2023 18

Exampleint main(void){

float *a_h, *b_h; //host datafloat *a_d, *b_d; //device dataint N = 15, nBytes, i;

nBytes = N*sizeof(float);a_h = (float*)malloc(nBytes);b_h = (float*)malloc(nBytes);

cudaMalloc((void**)&a_d,nBytes);cudaMalloc((void**)&b_d,nBytes);

for(i=0; i<N; i++) a_h[i]=100.f +i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nByyes, cudaMemcpyDeviceToHost);

for(i=0; i<N; i++) assert(a_h[i] == b_h[i]);free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);return 0;}

Seminar ‘11 CUDA

Host

a_h

b_h

Device

a_d

b_d

04/10/2023 19

Applications

Oil & Gas

Finance Medical

Biophysics

Numeric

Audio Video Imaging

Seminar ‘11 CUDA

04/10/2023 20

Advantages

Provides shared memory

Cost effective

The gaming industries demand on Graphics cards has

forced a lot of research and money into the improvement

of the GPUs

Transparent Scalability

Seminar ‘11 CUDA

04/10/2023 21

Drawbacks

Despite having hundreds of “cores” CUDA is not as

flexible as CPU’s

Not as effective for personal computers

Seminar ‘11 CUDA

04/10/2023 22

Future Scope

Implementation of CUDA in several other group of

companies’ GPUs.

More and more streaming processors can be included

CUDA in wide variety of programming languages.

Seminar ‘11 CUDA

04/10/2023 23

Conclusion

Brought significant innovations to the High Performance

Computing world.

CUDA simplified process of development of general

purpose parallel applications.

These applications have now enough computational

power to get proper results in a short time.

Seminar ‘11 CUDA

04/10/2023 24

ReferencesSeminar ‘11 CUDA

1. “CUDA by Example: An Introduction to General-Purpose GPU Programming” by Edward kandrot

2. “Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series)” By David B kirk & Wen Mei W. Hwu.

3. “GPU Computing Gems Emerald Edition (Applications of GPU Computing Series)” By Wen-mei W. Hwu .

4. “The Cost To Play: CUDA Programming” , By Douglas Eadline, Ph.D. ,on Linux Magazine Wednesday, February 17th, 2010

5. “Nvidia Announces CUDA x86” Written by Cristian, On Tech Connect Magazine 21 September 2010

6. CUDA Programming Guide. ver. 1.1, http://www.nvidia.com/object/cuda_develop.html

7. TESLA GPU Computing Technical Brief, http://www.nvidia.com/object/tesla_product_literature.html

8. G80 architecture reviews and specification, http://www.nvidia.com/page/8800_reviews.html, http://www.nvidia.com/page/8800_tech_specs.html

9. Beyond3D G80: Architecture and GPU Analysis, http://www.beyond3d.com/content/reviews/1

10. Graphics adapters supporting CUDA, http://www.nvidia.com/object/cuda_learn_products.html

Questions?????

Seminar ‘11 CUDA

cuda

Education

seminar 11cuda8

thread id threads

shared memory

device memory host

group of threads threads

global memory

memory addresses31720126

synchronization thread