evolution of gpus, tools, command-line compilercuda.nik.uni-obuda.hu/doc/introduction.pdf · •...
TRANSCRIPT
![Page 1: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/1.jpg)
GPU Programminghttp://cuda.nik.uni-obuda.hu
Szénási Sá[email protected]
GPU Education Center of Óbuda University
Introduction
Evolution of GPUs, Tools, Command-line compiler
![Page 2: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/2.jpg)
Evolution of GPU architectures
Hardware overview
CUDA Framework
Compile our first program
Visual Studio Support
INTRODUCTION
![Page 3: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/3.jpg)
Today’s GPUs
GeForce GTX Titan X
• 3072 CUDA Cores
• 12 GB Memory
• 6.2 TFLOPS
• Price: $1100 (2016)
• Power consumption: 250W
3
[i1]
![Page 4: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/4.jpg)
For comparision
Intel ASCI Red
• No. 1 system from June 1997 to June 2000
• 9632 cores (Pentium processors upgraded to Pentium II Xeon processors)
• 1212 GB Memory
• 3.2 TFLOPS
• Price:$46,000,000M (1997)
• Power consumption:850kW
4
[i2]
![Page 5: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/5.jpg)
Hardware 3D support
Initial period
• Screen resolution
• Number of colors
• Refresh frequency
3D graphics
• Games
• GFX application
• CAD applications
• Etc.
Hardware support
• General functions for all kind of applications
◦ Mapping the 3D scene to the 2D plane
◦ Texturing
• Well parallelizable tasks
5
[i3]
[i4]
[i5]
![Page 6: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/6.jpg)
Early hardware design
GPU
• A Graphical Processing Unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the building of images in a frame buffer intended for output to a display [1].
• Modern GPUs are very efficient at manipulating computer graphics, especially in 3D rendering. These functions are usually available through some standard APIs, like:
◦ OpenGL (www.opengl.org)
◦ Direct3D (www.microsoft.com)
Shaders
• A shader is a computer program or a hardware unit that is used to do shading (the production of appropriate levels of light and darkness and colors) [2]
• Older graphics cards utilize separate processing units for the main tasks:
◦ Vertex shader – the purpose is to transform each vertex’s 3D position in the virtual space to a 2D position on the screen
◦ Pixel shader – the purpose is to compute colour and lightness information for all pixels in the screen (based on textures, lighting etc.)
◦ Geometry shader – the purpose is to generate new primitives and modify the existing ones
6
![Page 7: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/7.jpg)
Optimization of shaders
• Older graphics cards utilize separate processing units for each shader type
• It’s hard to optimize the number of the different shaders because different tasks need different shaders: [3]
◦ Task 1. geometry is quite simple
complex light conditions
◦ Task 2. geometry is complex
texturing is simple
Unified Shader
• Later shader models reduced thedifferences between the physical processing units (see SM 2.x and SM 3.0)
• Nowadays graphics cards are usually contains only one kind of processing units which is capable for every tasks. These are flexibly schedulable to a variety of tasks
• The Unified Shader Model uses a consistent instruction set across all shade types. All shaders have almost the same capabilities – they can read textures, data buffers and perform the same set of arithmetic instructions [4]
Shader models
7
[i6]
![Page 8: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/8.jpg)
General purpose programming
Parallelization of the hardware
• The Unified Shader Model means that the GPU use the same processor core to implement all functions. These are simple processing units with a small set of instructions
• Therefore graphics card manufacturers can increase the number of execution units. Nowadays a GPU usually have ~1000 units
Side-effect of Unified Shader Model
• Consequently GPUs have massive computing power. It’s worth to utilize this computing power not only in the area of computer graphics:
GPGPU: General-Purpose Computing on Graphics Processor Units
• In the first time it was a hard job to develop software components for graphics cards. The developer had to use the direct language of the shaders
• Nowadays, the graphics card manufacturers support the software developers with convenient development frameworks:
◦ Nvidia CUDA
◦ ATI Stream
◦ OpenCL
8
![Page 9: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/9.jpg)
Real-world applications
There several CUDA based applications
• GPU computing applications developed on the CUDA architecture by programmers, scientists, and researchers around the world
• See more at CUDA Community Showcase
9
[i7]
![Page 10: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/10.jpg)
Computational power of GPUs
GPUs have enormous computational power
• mainly in the field of single precision arithmetic (light green)
10
[i8]
![Page 11: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/11.jpg)
Evolution of GPU architectures
Hardware overview
CUDA Framework
Compile our first program
Visual Studio Support
INTRODUCTION
![Page 12: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/12.jpg)
CPU architecture – Latency Oriented Design
Powerful ALU(s)
• Reduce operation latency
Large cache(s)
• Convert long latency memory access to short latency cache access
Sophisticated control
• Superscalar architecture
• Branch prediction
• Data forwarding
12
[i9]
![Page 13: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/13.jpg)
GPU architecture – throughput oriented design
Small cache area
• Depends on architecture
Simple control
• No branch prediction/data forwarding
Energy efficient ALUs
• We can use a lot of them
Fast context switching
• The number of threads must be greater than the number of units
13
[i9]
![Page 14: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/14.jpg)
First programmable cards
Geforce8 series (G80, …)
• Release date: 2006
• Tesla microarchitecture
• First card using USM
• GeForce 8800 GTX
◦ 128 SM
◦ 768 MB RAM
◦ 518 GFLOPS
Geforce9 series (G92, …)
• Release date: 2008
Geforce200 series (GT200, …)
• Release date: 2008-2009
• 2nd gen Tesla microarchitecture
• Double support
• GeForce 260 GTX, 280GTX
14
[i11]
![Page 15: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/15.jpg)
Nvidia GeForce GTX280 card
Architecture
• Based on the GT200
• 10 Texture/Processor Cluster
• Each TPC contains 3 Streaming Multiprocessors
• Each SM contains 8 CUDA cores
• 10 x 8 x 3 = 240 CUDA cores
15
![Page 16: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/16.jpg)
Fermi architecture
Geforce400 series (GF100, …)
• Release date: 2010
• Fermi microarchitecture
• 16SM x 32SP/SM = 512 SP
• Disabled SMs
• Tesla line
◦ ECC
◦ Double precision speed
Geforce500 series (GF110, ...)
• Fermi based models
• Less power consumption
16
[i9]
[i12]
![Page 17: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/17.jpg)
Kepler architecture
Geforce600 series (GK104, …)
• Release date: 2012
• Kepler microarchitecure
• Performance/watt
• SM → SMX
• Graphics Processing Cluster
Geforce700 series
• Kepler based models
• Dynamic parallelism
17
[i13]
![Page 18: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/18.jpg)
Maxwell architecture
Geforce900 series (GM107, …)
• Release date: 2014
• Codename: Maxwell
• Integrated ARM CPU
• SMX → SMM
18
[i14]
![Page 19: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/19.jpg)
GPU execution model – throughput oriented design
Example: summing vectors
• We need as many threads as thesize of the vectors
• Each thread will sum one item fromvector a and b and store the resultto the third vector (c)
• We call this as a GPU kernel
Prerequisites
• To achieve high performance, thesetasks must be more or less independent
• There should be enough tasks to utilize all execution units of the GPU
Context switching mechanism
• The GPU will group all threads into blocks and run these blocks using one of the multiprocessors
◦ Because of architectural limitations
◦ To hide latency (one MP can run multiple blocks)
• These blocks are totally independents
◦ There is no communication between blocks
◦ We can not schedule the running order of these19
[i10]
![Page 20: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/20.jpg)
Decision based on the execution model
CPU algorithms
• Can be faster for sequential code
• Easy to implement
GPU algorithms
• Can be faster for parallel code
• We need a lot of threads to utilize all processing power
• These threads must be more or less independent within one block
• All blocks must be totally independent
Heterogeneous architecture
• Both unit has advantages and disadvantages
• It is worth to use both the CPU(s) and GPU(s)
• All GPUs are co-processors of the CPU
20
![Page 21: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/21.jpg)
SIMT execution
Sources of parallelism (SIMD < SIMT < SMT) [25]
• In SIMD, elements of short vectors are processed in parallel
• In SMT, instructions of several threads are run in parallel
• SIMT is somewhere in between - an interesting hybrid between vector processing and hardware threading
SIMD commands
• In case of the well known SIMD commands, the developer must ensure that all the operands will be in the right place and format. In case of SIMT execution, the execution units can reach different addresses in the global memory
• It is possible to useconditions with SIMTexecution
• The branchesof the condition will beexecuted sequentially
• This leads to badperformance
21
[i15]
![Page 22: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/22.jpg)
Memory architecture
We can distingish the following memory areas:
• Host Memory
• Device Memory of GPU1, GPU2, …
Transfer necessity
• All devices can work only with data in the appropriate memory areas.
• We can use transfer functions to copy data
◦ Host Memory to Device Memory / Device Memory to Host Memory
◦ Device Memory to Device Memory
• These functionssignificantly degradethe performance
22
[i16]
![Page 23: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/23.jpg)
Overview
GPU advantages
• Outstanding peek computing capacity
• Favourable price/performance ratio
• Scalable with the ability of multi-GPU development
• Dynamic development (partly due to the gaming industry)
GPU disadvantages
• GPU execution units are less independents than CPU cores. The peek performance is available only in some special (especially data parallel) tasks→ Use heterogeneous architecture and use the GPU only for the data parallel
tasks
• Graphics cards have a separated memory region and GPUs can not access the system memory. Therefore we usually need some memory transfers before the real processing→ Optimize the number of these memory transfers. In some cases these
transfers make unusable the whole GPU solution
• GPGPU programming is a new area, therefore the devices are less mature, the development time and cost is significantly higher
23
![Page 24: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/24.jpg)
Evolution of GPU architectures
Hardware overview
CUDA Framework
Compile our first program
Visual Studio Support
INTRODUCTION
![Page 25: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/25.jpg)
CUDA environment
• CUDA C (Compute Unified Device Architecture) is the compute engine in NVIDIA graphics processing units or GPUs, that is accessible to software developers through industry standard programming languages
• Free development framework, downloadable for all developers
• Similar to C / C++ programming languages
Releases
• 2007. June. – CUDA 1.0
• 2008. Aug. – CUDA 2.0
• 2010. March. – CUDA 3.0
• …
• 2015. Jan. – CUDA 7.0
Supported GPUs
• Nvidia GeForce series
• Nvidia GeForce mobile series
• Nvidia Quadro series
• Nvidia Quadro mobile series
• Nvidia Tesla series
25
![Page 26: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/26.jpg)
Requested components
• Appropriate CUDA compatibleNVIDIA graphics driver
• CUDA compilerTo compile .cu programs
• CUDA debuggerTo debug GPU code
• CUDA profilerTo profiling GPU code
• CUDA SDKSample applications,documentation
Download CUDA
• CUDA components are available from:
https://developer.nvidia.com/cuda-downloads
NVIDIA Developer Zone
26
[i17]
![Page 27: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/27.jpg)
CUDA platform overview
Main components
• The CUDA C language is based on the C/C++ languages (host and device code), but there are other alternatives (Fortran etc.)
• The CUDA environment contains some function libraries that simplify programming (FFT, BLAS)
• Hardware abstraction mechanism hides the details of the GPU architecture
◦ It simplifies the high-level programming model
◦ It makes easy to change the GPU architecture in the future
Separate host and device code
• Programmers can mix GPU codewith general-purpose code forthe host CPU
• Common C/C++ source codewith different compiler forksfor CPUs and GPUs
• The developer can choose thecompiler of the host code
27
[i18]
![Page 28: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/28.jpg)
CUDA development
Analysing the task
• Unlike traditional programs in addition to selecting the right solution we have to find the well parallelizable parts of the algorithm
• The ratio of parallelizable/nonparallelizable parts can be a good indicator that it is worth to create a parallel version or not
• Sometimes we have to optimize the original solution (decrease the number of memory transfers/kernel executions) or create an entirely new one
Implementing the C/C++ code
• In practice we have only one source file, but it contains booth the CPU and the GPU source code:
◦ Sequential parts for the CPU
◦ Data Parallel parts for the GPU
Compiling and linking
• The CUDA framework contains several utilities, therefore the compiling and linking means only the execution of the ncc compiler
28
![Page 29: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/29.jpg)
Evolution of GPU architectures
Hardware overview
CUDA Framework
Compile our first program
Visual Studio Support
INTRODUCTION
![Page 30: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/30.jpg)
Compile the given code
• Use the „nvcc” compiler
Run the executable
Compiling & running the example
nvcc--output-file first.exe--compiler-bindir "c:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin“--generate-code arch=compute_12,code=sm_12 first.cu
30
![Page 31: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/31.jpg)
Main parameters of the nvcc compiler
Usage of the compiler
• Default path (in the case of CUDA 7.0 x64 Windows installation):"c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\nvcc.exe"
• Usage:nvcc [options] <inputfile>
Specifying the compilation phase
• --compile(-c)Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file
• --link(-link)This option specifies the default behaviour: compile and link all inputs
• --lib(-lib)Compile all inputs into object files (if necessary) and add the results to the specified output library file
• --run(-run)This option compiles and links all inputs into an executable, and executes it
• --ptx(-ptx)Compile all .cu/.gpu input files to device-only .ptx files
• --cubin(-cubin)Compile all .cu/.ptx/.gpu input files to device- only .cubin files
31
![Page 32: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/32.jpg)
Main parameters of the nvcc compiler (2)
Setting directory information
• --output-directory <directory>(-odir)Specify the directory of the output file
• --output-file <file>(-o)Specify name and location of the output file. Only a single input file is allowed when this option is present in nvcc non-linking/archiving mode
• --compiler-bindir <directory> (-ccbin)Specify the directory in which the compiler executable (Microsoft Visual Studio cl, or a gcc derivative) resides. By default, this executable is expected in the current executable search path
• --include-path <include-path>(-I) Specify include search paths
• --library <library>(-l) Specify libraries to be used in the linking stage. The libraries are searched for on the library search paths that have been specified using option '-L'
• --library-path <library-path>(-L) Specify library search paths
32
![Page 33: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/33.jpg)
Main parameters of the nvcc compiler (3)
Options for steering GPU code generations>
• --generate_code arch=<arch>,code=<code>Specify the nvcc behaviour with respect to code generation.The architecture specifies the name of the class of NVIDIA GPU architectures for which the CUDA input files must be compiled.The code specifies the names of NVIDIA GPUs to generate code for. The nvccwill embed a compiled code image in the resulting executable for each specified ‘code’ architecture.Currently supported compilation architectures are:
◦ virtual architectures 'compute_11','compute_12','compute_13','compute_20','compute_30','compute_32','compute_35','compute_37','compute_50‚ etc.;
◦ GPU architectures'sm_11','sm_12','sm_13','sm_20','sm_21','sm_30','sm_32','sm_35','sm_37', 'sm_50', etc.
33
![Page 34: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/34.jpg)
Main parameters of the nvcc compiler (4)
Miscellaneous options for guiding the compiler driver:
• --profile (-pg)Instrument generated code/executable for use by gprof (Linux only)
• --debug (-g)Generate debug information for host code.
• --optimize<level>(-O)Specify optimization level for host code
• --verbose(-v)List the compilation commands generated by this compiler driver, but donotsuppress their execution
• --keep (-keep)Keep all intermediate files that are generated during internal compilationsteps
• --host-compilation <language>Specify C vs. C++ language for host code in CUDA source files. Allowed values for this option: 'C','C++','c','c++'.Default value: 'C++‘
• --machine <bits> (-m) Specify 32 vs 64 bit architecture. Allowed values for this option: 32,64. Default value: 64
34
![Page 35: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/35.jpg)
Overview of compilation
first.cu
nvcc.exe
cl.exe
first.obj
first.cpp1.ii first.ptx
ptxas.exe
first_sm_10.cubin
first.exe
Libraries
35
![Page 36: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/36.jpg)
Evolution of GPU architectures
Hardware overview
CUDA Framework
Compile our first program
Visual Studio Support
INTRODUCTION
![Page 37: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/37.jpg)
Visual Studio capabilities
Visual Studio support
• Latest CUDA versions support Visual Studio 2013
• After installing CUDA some new functions appear in Visual Studio
◦ New project wizard
◦ Custom build rules
◦ CUDA syntax highlighting
◦ etc.
New project wizard
• Select File/New/Project/Visual C++/CUDA[64]/CUDAWinApp
• Click “Next” on thewelcome screen
37
![Page 38: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/38.jpg)
Build Customization Files
• Right click on project name, and select “Build Customization”
• Select the appropriate one based on the CUDA version
• In case of older Visual Studio editions, check “Custom Build Rules”
38
![Page 39: Evolution of GPUs, Tools, Command-line compilercuda.nik.uni-obuda.hu/doc/Introduction.pdf · • The CUDA C language is based on the C/C++ languages (host and device code), but there](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed37c06847f87317f77bf27/html5/thumbnails/39.jpg)
CUDA related project properties
• Select project and click on “Project properties” and click on “CUDA Build Rule”
• There are several options in multiple tabs (debug symbols, GPU arch., etc.)
• These are the same options as discussed in nvcc compiler options part
• The “Command Line” tab shows the actual compiling parameters
39