cuda
DESCRIPTION
New Seminar topicTRANSCRIPT
CUDA
Presented By:
S7 CS
Guided By:
04/10/2023 2
Contents
WHAT IS CUDA ??????1
EXECUTION MODEL2
IMPLEMENTATION3
APPLICATION4
Seminar ‘11 CUDA
04/10/2023 3
What is CUDA ??????
CUDA – Compute Unified Device Architecture Hardware and software architecture
For computing on the GPU
Developed by Nvidia in 2007
GPU
Do massive amount of task simultaneously and quickly by
using several ALUs
ALUs are programmable by Graphics API
Seminar ‘11 CUDA
04/10/2023 4
What is CUDA ??????
Using CUDA – No need to map GPU towards Graphics APIs
CUDA provides number crunching very fast
CUDA is well suited for highly parallel algorithms and
large datasets
Consists of heterogeneous programming model and software environment
Hardware and software models
An Extension of C programming
Designed to enable heterogeneous computation
Computation with CPU &GPU
Seminar ‘11 CUDA
04/10/2023 5
CUDA kernels & threads Device = GPU
Executes parallel portions of an application as kernels
Host = CPU
Executes serial portions of an application
Kernel = Functions that runs on device
One kernel at one time
Many threads execute each kernel
Posses host and device memory
Host and device connected by PCI EXPRESS X16
Seminar ‘11 CUDA
04/10/2023 6
Arrays parallel threads
A CUDA kernel is executed by an array of threads
All threads run the same code
Each thread has ID uses to compute memory addresses
Seminar ‘11 CUDA
04/10/2023 7
Thread batching Thread cooperation is valuable
Share results to avoid redundant computation
Share memory accesses
Thread block = Group of threads
Threads cooperate together using shared memory and
synchronization
Thread ID is calculated by
x+yDx (for 2 dimensional block)
(x,y) – thread index
(Dx,Dy) – block size
Seminar ‘11 CUDA
04/10/2023 8
Thread Batching (Contd…) (x+yDx+zDxDy) (for 3 dimensional block)
(x,y,z) – thread index
(Dx,Dy,Dz) – block size
Grid = Group of thread blocks
Seminar ‘11 CUDA
04/10/2023 9
Thread Batching (Contd…) There is block ID
• Calculated as thread ID
Threads in different blocks cannot cooperate
Seminar ‘11 CUDA
04/10/2023 10
Transparent Scalability
Hardware is free to schedule thread blocks on any processor
A kernel scales across parallel multiprocessors
Seminar ‘11 CUDA
04/10/2023 11
CUDA architectures
Architecture’s Codename G80 GT200 Fermi
Release Year 2006 2008 2010
Number of Transistors 681 million 1.4 billion 3.0 billion
Streaming Multiprocessors (SM)
16 30 16
Streaming Processors (per SM)
8 8 32
Streaming Processors (total) 128 240 512
Shared Memory (per SM) 16 KB 16 KBConfigurable 48
KB or 16 KB
L1 Cache (per SM) None NoneConfigurable 16
KB or 48 KB
Seminar ‘11 CUDA
04/10/2023 12
8 & 10 Series Architecture
G80
GT200
Seminar ‘11 CUDA
04/10/2023 13
Kernel memory access
Per thread
Seminar ‘11 CUDA
Thread
Block
Per block
Per device
04/10/2023 14
Physical Memory Layout
“Local” memory resides in device DRAM
Use registers and shared memory to minimize local memory use
Host can read and write global memory but not shared memory
Seminar ‘11 CUDA
04/10/2023 15
Execution Model
Threads are executed by thread processors
Thread blocks are executed by multiprocessors
A kernel is launched as a grid of thread blocks
Seminar ‘11 CUDA
04/10/2023 16
CUDA software developmentSeminar ‘11 CUDA
04/10/2023 17
Compiling CUDA code
CUDA nvcc compiler to
compile the .cu files which
divides code into NVidia
assembly and C++ code.
Seminar ‘11 CUDA
04/10/2023 18
Exampleint main(void){
float *a_h, *b_h; //host datafloat *a_d, *b_d; //device dataint N = 15, nBytes, i;
nBytes = N*sizeof(float);a_h = (float*)malloc(nBytes);b_h = (float*)malloc(nBytes);
cudaMalloc((void**)&a_d,nBytes);cudaMalloc((void**)&b_d,nBytes);
for(i=0; i<N; i++) a_h[i]=100.f +i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nByyes, cudaMemcpyDeviceToHost);
for(i=0; i<N; i++) assert(a_h[i] == b_h[i]);free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);return 0;}
Seminar ‘11 CUDA
Host
a_h
b_h
Device
a_d
b_d
04/10/2023 19
Applications
Oil & Gas
Finance Medical
Biophysics
Numeric
Audio Video Imaging
Seminar ‘11 CUDA
04/10/2023 20
Advantages
Provides shared memory
Cost effective
The gaming industries demand on Graphics cards has
forced a lot of research and money into the improvement
of the GPUs
Transparent Scalability
Seminar ‘11 CUDA
04/10/2023 21
Drawbacks
Despite having hundreds of “cores” CUDA is not as
flexible as CPU’s
Not as effective for personal computers
Seminar ‘11 CUDA
04/10/2023 22
Future Scope
Implementation of CUDA in several other group of
companies’ GPUs.
More and more streaming processors can be included
CUDA in wide variety of programming languages.
Seminar ‘11 CUDA
04/10/2023 23
Conclusion
Brought significant innovations to the High Performance
Computing world.
CUDA simplified process of development of general
purpose parallel applications.
These applications have now enough computational
power to get proper results in a short time.
Seminar ‘11 CUDA
04/10/2023 24
ReferencesSeminar ‘11 CUDA
1. “CUDA by Example: An Introduction to General-Purpose GPU Programming” by Edward kandrot
2. “Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series)” By David B kirk & Wen Mei W. Hwu.
3. “GPU Computing Gems Emerald Edition (Applications of GPU Computing Series)” By Wen-mei W. Hwu .
4. “The Cost To Play: CUDA Programming” , By Douglas Eadline, Ph.D. ,on Linux Magazine Wednesday, February 17th, 2010
5. “Nvidia Announces CUDA x86” Written by Cristian, On Tech Connect Magazine 21 September 2010
6. CUDA Programming Guide. ver. 1.1, http://www.nvidia.com/object/cuda_develop.html
7. TESLA GPU Computing Technical Brief, http://www.nvidia.com/object/tesla_product_literature.html
8. G80 architecture reviews and specification, http://www.nvidia.com/page/8800_reviews.html, http://www.nvidia.com/page/8800_tech_specs.html
9. Beyond3D G80: Architecture and GPU Analysis, http://www.beyond3d.com/content/reviews/1
10. Graphics adapters supporting CUDA, http://www.nvidia.com/object/cuda_learn_products.html
Questions?????
Seminar ‘11 CUDA