introduction to c++ amp a ccelerated m assive p arallelism
DESCRIPTION
Introduction to C++ AMP A ccelerated M assive P arallelism. Marc Grégoire Software Architect [email protected] http://www.nuonsoft.com / http://www.nuonsoft.com/blog /. Disclaimer: This presentation was made based on the released Visual C++ 11 Preview. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/1.jpg)
Introduction to C++ AMPAccelerated Massive Parallelism
It is time to start taking advantage of the computing power of GPUs…
Marc GrégoireSoftware [email protected] http://www.nuonsoft.com/ http://www.nuonsoft.com/blog/ January 23rd
2012
Disclaimer: This presentation was made based on the released Visual C++ 11 Preview
![Page 2: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/2.jpg)
Agenda
Introduction Demo: N-Body Simulation Technical
The C++ AMP Technology Coding Demo: Mandelbrot
Summary Resources
![Page 3: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/3.jpg)
The Power of Heterogeneous Computing
146X
Interactive visualization of
volumetric white matter
connectivity
36X
Ionic placement for molecular
dynamics simulation on
GPU
19X
Transcoding HD video stream to
H.264
Simulation in Matlab
using .mex file CUDA function
100X
Astrophysics N-body simulation
149X
Financial simulation of LIBOR model
with swaptions
47X
GLAME@lab: An M-script API for linear Algebra operations on
GPU
20X
Ultrasound medical imaging
for cancer diagnostics
24X
Highly optimized object oriented
molecular dynamics
30X
Cmatch exact string matching to find similar
proteins and gene sequences
17X
source
![Page 4: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/4.jpg)
CPUs vs GPUs today
CPU Low memory bandwidth Higher power consumption Medium level of parallelism Deep execution pipelines Random accesses Supports general code Mainstream programming
GPU High memory bandwidth Lower power consumption High level of parallelism Shallow execution
pipelines Sequential accesses Supports data-parallel
code Niche programmingimages source: AMD
![Page 5: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/5.jpg)
Tomorrow…
CPUs and GPUs coming closer together… …nothing settled in this space yet, things still in
motion…
C++ AMP is designed as a mainstream solution not only for today, but also for tomorrow images source: AMD
![Page 6: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/6.jpg)
C++ AMP
Part of Visual C++ (available nowin the Visual C++ 11 preview)
Complete Visual Studio integration STL-like library for multidimensional data Builds on Direct3Dperformance
portability
productivity
![Page 7: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/7.jpg)
C++ AMP
Abstracts accelerators Current version supports DirectX 11 GPUs Could support others like FPGA’s, off-site
cloud computing… Support heterogeneous mix of accelerators
Example: C++ AMP can use both an NVidia and AMD GPU in your system at the same time
![Page 8: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/8.jpg)
Demo…N-Body
Simulation
![Page 9: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/9.jpg)
N-Body Simulation Demo
![Page 10: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/10.jpg)
Agenda
Introduction Demo: N-Body Simulation Technical
The C++ AMP Technology Coding Demo: Mandelbrot
Summary Resources
![Page 11: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/11.jpg)
C++ AMP – Basics
Everything is in the concurrency namespace concurrency::array_view
Wraps a user-allocated buffer so that C++ AMP can use it
concurrency::array Allocates a buffer that can be used by C++ AMP
C++ AMP automatically transfers data between those buffers and memory on the accelerators
![Page 12: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/12.jpg)
C++ AMP – array_view
Read/write buffer of given dimensionality, with elements of given type: array_view<type, dim>
Read-only buffer: array_view<const type, dim>
Write-only buffer: array_view<writeonly<type>, dim>
![Page 13: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/13.jpg)
C++ AMP – parallel_for_each() concurrency::parallel_for_each(grid, lambda);
grid is the compute domain over which you want to perform arithmetic For array_view objects, use the array_view grid property
The lambda is the code to execute on the accelerator A restriction should be applied to the lambda, restrict(direct3d), which is a
compile-time check to validate the lambda for execution on direct3d accelerators
Accepts 1 parameter: the index into the compute domain (= grid) Same dimensionality as the grid, so if grid is 2D, index is also 2D:
index<2> idx Access the two dimensions as idx[0] and idx[1]
Example:array_view<writeonly<int>, 2> a(height, width, pBuffer);parallel_for_each(a.grid, [=](index<2> idx) restrict(direct3d) { /* … */ });
![Page 14: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/14.jpg)
C++ AMP – parallel_for_each() – lambda
Several restrictions apply to the code in the lambda: Can only call other restrict(direct3d) functions Must capture everything by value, except concurrency::array
objects No recursion, no virtual functions, no pointers to functions, no
pointers to pointers, no goto, no try/catch/throw statements, no global variables, no static variables, no dynamic_cast, no typeid, no asm, no varargs, …
restrict(direct3d), see http://blogs.msdn.com/b/nativeconcurrency/archive/2011/09/05/restrict-a-key-new-language-feature-introduced-with-c-amp.aspx + MSDN
![Page 15: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/15.jpg)
C++ AMP – parallel_for_each() – lambda
The lambda executes in parallel with CPU code that follows parallel_for_each() until a synchronization point is reached
Synchronization: Manually when calling array_view::synchronize()
Good idea, because you can handle exceptions gracefully Automatically when for example array_view goes out of scope
Bad idea, errors will be ignored silently because destructors are not allowed to throw exceptions
Automatically, when CPU code observes the array_view Not recommended, because you might lose error information
if there is no try/catch block catching exceptions at that point
![Page 16: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/16.jpg)
C++ AMP – accelerator / accelerator_view
The concurrency::accelerator and concurrency::accelerator_view classes can be used to query for information on the installed accelerators
get_accelerators() returns a vector of accelerators in the system
![Page 17: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/17.jpg)
Coding Demo…Mandelbrot
![Page 18: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/18.jpg)
Mandelbrot – Single-Threadedfor (int y = -halfHeight; y < halfHeight; ++y) { // Formula: zi = z^2 + z0 float Z0_i = view_i + y * zoomLevel; for (int x = -halfWidth; x < halfWidth; ++x) { float Z0_r = view_r + x * zoomLevel; float Z_r = Z0_r; float Z_i = Z0_i; float res = 0.0f; for (int iter = 0; iter < maxiter; ++iter) { float Z_rSquared = Z_r * Z_r; float Z_iSquared = Z_i * Z_i; if (Z_rSquared + Z_iSquared > escapeValue) { // We escaped res = iter + 1 - log(log(sqrt(Z_rSquared + Z_iSquared))) * invLogOf2; break; } Z_i = 2 * Z_r * Z_i + Z0_i; Z_r = Z_rSquared - Z_iSquared + Z0_r; } unsigned __int32 result = RGB(res * 50, res * 50, res * 50); pBuffer[(y + halfHeight) * m_nBuffWidth + (x + halfWidth)] = result; }}
![Page 19: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/19.jpg)
Mandelbrot – PPLparallel_for(-halfHeight, halfHeight, 1, [&](int y) { // Formula: zi = z^2 + z0 float Z0_i = view_i + y * zoomLevel; for (int x = -halfWidth; x < halfWidth; ++x) { float Z0_r = view_r + x * zoomLevel; float Z_r = Z0_r; float Z_i = Z0_i; float res = 0.0f; for (int iter = 0; iter < maxiter; ++iter) { float Z_rSquared = Z_r * Z_r; float Z_iSquared = Z_i * Z_i; if (Z_rSquared + Z_iSquared > escapeValue) { // We escaped res = iter + 1 - log(log(sqrt(Z_rSquared + Z_iSquared))) * invLogOf2; break; } Z_i = 2 * Z_r * Z_i + Z0_i; Z_r = Z_rSquared - Z_iSquared + Z0_r; } unsigned __int32 result = RGB(res * 50, res * 50, res * 50); pBuffer[(y + halfHeight) * m_nBuffWidth + (x + halfWidth)] = result; }});
![Page 20: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/20.jpg)
Mandelbrot – C++ AMParray_view<writeonly<unsigned __int32>, 2> a(m_nBuffHeight, m_nBuffWidth, pBuffer);parallel_for_each(a.grid, [=](index<2> idx) restrict(direct3d) { // Formula: zi = z^2 + z0 int x = idx[1] - halfWidth; int y = idx[0] - halfHeight; float Z0_i = view_i + y * zoomLevel; float Z0_r = view_r + x * zoomLevel; float Z_r = Z0_r; float Z_i = Z0_i; float res = 0.0f; for (int iter = 0; iter < maxiter; ++iter) { float Z_rSquared = Z_r * Z_r; float Z_iSquared = Z_i * Z_i; if (Z_rSquared + Z_iSquared > escapeValue) { // We escaped res = iter + 1 - fast_log(fast_log(fast_sqrt(Z_rSquared + Z_iSquared))) * invLogOf2; break; } Z_i = 2 * Z_r * Z_i + Z0_i; Z_r = Z_rSquared - Z_iSquared + Z0_r; } unsigned __int32 result = RGB(res * 50, res * 50, res * 50); a[idx] = result;});a.synchronize();
![Page 21: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/21.jpg)
Mandelbrot – C++ AMP Wrap C++ AMP code inside a try-catch block!
try{ array_view<writeonly<unsigned __int32>, 2> a(m_nBuffHeight, m_nBuffWidth, pBuffer); parallel_for_each(a.grid, [=](index<2> idx) restrict(direct3d) {
...
}); a.synchronize();}catch (const Concurrency::runtime_exception& ex){ MessageBoxA(NULL, ex.what(), "Error", MB_ICONERROR);}
![Page 22: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/22.jpg)
Summary C++ AMP allows anyone to make use of parallel
hardware Easy-to-use High-level abstractions in C++ (not C) In-depth support of C++ AMP in Visual Studio,
including the debugger Abstracts multi-vendor hardware
Intention is to make C++ AMP an open specification
![Page 23: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/23.jpg)
Resources
Daniel Moth's blog (PM of C++ AMP), lots of interesting C++ AMP related posts http://www.danielmoth.com/Blog/
MSDN Native parallelism blog (team blog) http://blogs.msdn.com/b/nativeconcurrency/
MSDN Dev Center for Parallel Computing http://msdn.com/concurrency
MSDN Forums to ask questions http://social.msdn.microsoft.com/Forums/en/parallelcppnative/thr
eads
![Page 24: Introduction to C++ AMP A ccelerated M assive P arallelism](https://reader033.vdocuments.net/reader033/viewer/2022061612/568147b8550346895db4fd56/html5/thumbnails/24.jpg)
Questions
?I would like to thank Daniel Moth
from Microsoft for his inputs for this presentation.