cuda and opencl --- development interfaces for - university of iowa

44
Dec. 23 Harbin Engineering University CUDA and CUDA and OpenCL OpenCL --- --- Development Interfaces for Development Interfaces for Multicore Programming Multicore Programming Dr. Jun Ni, Ph.D. Dr. Jun Ni, Ph.D. Associate Professor of Radiology, Biomedical Associate Professor of Radiology, Biomedical Engineering, Mechanical Engineering, Computer Engineering, Mechanical Engineering, Computer Science Science The University of Iowa, Iowa City, Iowa, USA The University of Iowa, Iowa City, Iowa, USA

Upload: others

Post on 12-Feb-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Dec. 23 Harbin Engineering University

CUDA and CUDA and OpenCLOpenCL

------ Development Interfaces for Development Interfaces for

Multicore ProgrammingMulticore Programming

Dr. Jun Ni, Ph.D.Dr. Jun Ni, Ph.D.Associate Professor of Radiology, Biomedical Associate Professor of Radiology, Biomedical

Engineering, Mechanical Engineering, Computer Engineering, Mechanical Engineering, Computer ScienceScience

The University of Iowa, Iowa City, Iowa, USAThe University of Iowa, Iowa City, Iowa, USA

Page 2: CUDA and OpenCL --- Development Interfaces for - University of Iowa

OutlineOutline

CUDA, CUDA, NavidaNavida--based Programming based Programming environment for GPGPUenvironment for GPGPUOpenCLOpenCL, open source programming , open source programming environment for multicore processor systems environment for multicore processor systems for Cell/BE and GPU for Cell/BE and GPU

Page 3: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Introduction to CUDAIntroduction to CUDA

CUDACUDA (an acronym for (an acronym for Compute Unified Device Compute Unified Device ArchitectureArchitecture) )

a parallel computing architecture developed by a parallel computing architecture developed by NVIDIANVIDIACUDA is the computing engine in NVIDIA CUDA is the computing engine in NVIDIA graphics graphics processing unitsprocessing units ((GPUsGPUs))

accessible to software developers through industry standard accessible to software developers through industry standard programming languagesprogramming languages

Programmers use 'C for CUDA' (C with NVIDIA Programmers use 'C for CUDA' (C with NVIDIA extensions)extensions)

compiled through a compiled through a PathScalePathScale Open64Open64 C C compilercompilerto code algorithms for execution on the GPUto code algorithms for execution on the GPU

Page 4: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Introduction to CUDAIntroduction to CUDA

CUDA architecture supports a range of computational interfacesCUDA architecture supports a range of computational interfacesOpenCLOpenCLDirectComputeDirectCompute

Third party wrappers Third party wrappers PythonPythonFortranFortranJavaJavaMatlabMatlab

The latest drivers all contain the necessary CUDA components The latest drivers all contain the necessary CUDA components CUDA works with all NVIDIA CUDA works with all NVIDIA GPUsGPUs G8X series onwards:G8X series onwards:

GeForceGeForceQuadroQuadroTeslaTesla

Page 5: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Introduction to CUDAIntroduction to CUDA

CUDA gives developers access to the native CUDA gives developers access to the native instruction set and memory of the instruction set and memory of the parallel parallel computationalcomputational elements in CUDA elements in CUDA GPUsGPUs. .

Page 6: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Introduction to CUDAIntroduction to CUDA

Using CUDA, the latest NVIDIA Using CUDA, the latest NVIDIA GPUsGPUs effectively effectively become become open architecturesopen architectures like like CPUsCPUs. . Unlike CPUs, Unlike CPUs, GPUsGPUs have a parallel "manyhave a parallel "many--core" core" architecture, architecture, each core capable of running thousands of each core capable of running thousands of threads simultaneouslythreads simultaneously -- if an application is suited to if an application is suited to this kind of an architecture, the GPU can offer large this kind of an architecture, the GPU can offer large performance benefitsperformance benefitsMultiMulti--threads computing becomes more efficient within threads computing becomes more efficient within multicore architecturemulticore architecture

Page 7: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Introduction to CUDAIntroduction to CUDA

In the In the computer gamingcomputer gaming industry, in addition to industry, in addition to graphics rendering, graphics cards are used in graphics rendering, graphics cards are used in game game physics calculationsphysics calculations (physical effects like debris, smoke, (physical effects like debris, smoke, fire, fluids)fire, fluids)

PhysXPhysXBulletBullet

CUDA has also been used to accelerate nonCUDA has also been used to accelerate non--graphical graphical applications by an applications by an order of magnitudeorder of magnitude or moreor more

computational biologycomputational biologyCryptographyCryptographyOther fieldsOther fields

Page 8: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Introduction to CUDAIntroduction to CUDA

Example:Example:BOINCBOINC distributed computingdistributed computing clientclient

CUDA provides both a low level CUDA provides both a low level APIAPI and a higher level and a higher level APIAPIThe initial CUDA The initial CUDA SDKSDK was made public on 15 was made public on 15 February 2007, for February 2007, for Microsoft WindowsMicrosoft Windows and and LinuxLinuxMac OS XMac OS X support was later added in version 2.0. support was later added in version 2.0. which supersedes the beta released February 14, 2008which supersedes the beta released February 14, 2008

Page 9: CUDA and OpenCL --- Development Interfaces for - University of Iowa

AdvantagesAdvantages

CUDA has several advantages over traditional general CUDA has several advantages over traditional general purpose computation on purpose computation on GPUsGPUs (GPGPU) using (GPGPU) using graphics APIs.graphics APIs.

Scattered reads Scattered reads –– code can read from arbitrary addresses in code can read from arbitrary addresses in memory. memory. Shared memoryShared memory –– CUDA exposes a fast CUDA exposes a fast shared memoryshared memoryregion (16KB in size) that can be shared amongst threads.region (16KB in size) that can be shared amongst threads.This can be used as a userThis can be used as a user--managed cache, enabling higher managed cache, enabling higher bandwidth than is possible using texture lookupsbandwidth than is possible using texture lookupsFaster downloads and Faster downloads and readbacksreadbacks to and from the GPU to and from the GPU Full support for integer and bitwise operations, including Full support for integer and bitwise operations, including integer texture lookups.integer texture lookups.

Page 10: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Limitations and ChallengesLimitations and Challenges

It uses a recursionIt uses a recursion--free, functionfree, function--pointerpointer--free subset of free subset of the C language, plus some simple extensions. the C language, plus some simple extensions. However, a single process must run spread across However, a single process must run spread across multiple disjoint memory spaces, unlike other C multiple disjoint memory spaces, unlike other C language runtime environments. language runtime environments. Texture rendering is not supported. Texture rendering is not supported. For For double precisiondouble precision (only supported in newer (only supported in newer GPUsGPUslike GTX 260like GTX 260[12][12]) there are some deviations from the ) there are some deviations from the IEEE 754IEEE 754 standard: roundstandard: round--toto--nearestnearest--even is the only even is the only supported rounding mode for reciprocal, division, and supported rounding mode for reciprocal, division, and square root. square root.

Page 11: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Limitations and ChallengesLimitations and Challenges

In In single precisionsingle precision, , DenormalsDenormals and and signallingsignallingNaNsNaNs are not supportedare not supportedOnly two IEEE Only two IEEE roundingrounding modes are supported modes are supported (chop and round(chop and round--toto--nearest even)nearest even)Those are specified on a perThose are specified on a per--instruction basis instruction basis rather than in a control word rather than in a control word Precision of division/square root is slightly Precision of division/square root is slightly lower than single precision. lower than single precision.

Page 12: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Limitations and ChallengesLimitations and Challenges

The bus bandwidth and latency between the The bus bandwidth and latency between the CPU and the GPU may be a bottleneck. CPU and the GPU may be a bottleneck. Threads should be running in groups of at least Threads should be running in groups of at least 32 for best performance, with total number of 32 for best performance, with total number of threads numbering in the thousands. threads numbering in the thousands.

Page 13: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Limitations and ChallengesLimitations and Challenges

Branches in the program code do not impact Branches in the program code do not impact performance significantly, provided that each of performance significantly, provided that each of 32 threads takes the same execution path; the 32 threads takes the same execution path; the SIMDSIMD execution model becomes a significant execution model becomes a significant limitation for any inherently divergent task (e.g. limitation for any inherently divergent task (e.g. traversing a traversing a space partitioningspace partitioning data structure data structure during during raytracingraytracing). ).

Page 14: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Limitations and ChallengesLimitations and Challenges

Unlike Unlike OpenCLOpenCL, CUDA, CUDA--enabled enabled GPUsGPUs are only are only available from NVIDIA (available from NVIDIA (GeForceGeForce 8 series and 8 series and above, above, QuadroQuadro and and TeslaTesla). ). AMD/ATI only supports AMD/ATI only supports GPGPU GPGPU on its highon its high--end chipsend chips

Page 15: CUDA and OpenCL --- Development Interfaces for - University of Iowa
Page 16: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Example C++ loads a texture from Example C++ loads a texture from an image into an array on the GPU: an image into an array on the GPU:

cudaArraycudaArray* * cu_arraycu_array; texture<float, 2> ; texture<float, 2> textex; ; // Allocate array // Allocate array cudaChannelFormatDesccudaChannelFormatDesc

description = description = cudaCreateChannelDesccudaCreateChannelDesc<float>(); <float>(); cudaMallocArray(&cu_arraycudaMallocArray(&cu_array, &description, width, height); // Copy image data to array , &description, width, height); // Copy image data to array cudaMemcpy(cu_arraycudaMemcpy(cu_array, image, width*height*, image, width*height*sizeofsizeof((floatfloat), ), cudaMemcpyHostToDevicecudaMemcpyHostToDevice); ); // Bind the array to the texture // Bind the array to the texture cudaBindTextureToArray(texcudaBindTextureToArray(tex, , cu_arraycu_array); ); // Run kernel // Run kernel dim3 blockDim(16, 16, 1); dim3 blockDim(16, 16, 1); dim3 dim3 gridDim(widthgridDim(width

/ / blockDim.xblockDim.x, height / , height / blockDim.yblockDim.y, 1); , 1); kernel<<< kernel<<< gridDimgridDim, , blockDimblockDim, 0 >>>(, 0 >>>(d_odatad_odata, width, height); , width, height); cudaUnbindTexture(texcudaUnbindTexture(tex); ); __global__ __global__ voidvoid

kernel(kernel(floatfloat* * odataodata, , intint

height, height, intint

width) { width) { unsignedunsigned

intint

x = x = blockIdx.xblockIdx.x**blockDim.xblockDim.x

+ + threadIdx.xthreadIdx.x; ; unsignedunsigned

intint

y = y = blockIdx.yblockIdx.y**blockDim.yblockDim.y

+ + threadIdx.ythreadIdx.y; ; floatfloat

c = tex2D(tex, x, y); c = tex2D(tex, x, y); odata[yodata[y**width+xwidth+x] = c;] = c;} }

Page 17: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Example in Python: the product of Example in Python: the product of two arrays on the GPUtwo arrays on the GPU

importimport

pycuda.driverpycuda.driver

asas

drvdrv

importimport

numpynumpy

importimport

pycuda.autoinitpycuda.autoinit

mod = mod = drv.SourceModuledrv.SourceModule(""" __global__ (""" __global__

void void multiply_them(floatmultiply_them(float

**destdest, float *a, float *b) , float *a, float *b) { const { const intint

i = i = threadIdx.xthreadIdx.x; ; dest[idest[i] = ] = a[ia[i] * ] * b[ib[i]; ];

} """) } """) multiply_themmultiply_them

= = mod.get_function("multiply_themmod.get_function("multiply_them") ")

a = numpy.a = numpy.randomrandom.randn(400).astype(numpy.float32) .randn(400).astype(numpy.float32) b = numpy.b = numpy.randomrandom.randn(400).astype(numpy.float32) .randn(400).astype(numpy.float32) destdest

= = numpy.zeros_like(anumpy.zeros_like(a) )

multiply_themmultiply_them( ( drv.Out(destdrv.Out(dest), ), drv.In(adrv.In(a), ), drv.In(bdrv.In(b), block=(400,1,1)) ), block=(400,1,1)) printprint

destdest--a*b a*b

Page 18: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Example in Python: matrix Example in Python: matrix multiplication in GPUmultiplication in GPU

importimport

numpynumpyfromfrom

pycublaspycublas

importimport

CUBLASMatrixCUBLASMatrix

A = A = CUBLASMatrixCUBLASMatrix( ( numpy.mat([[1,2,3],[4,5,6]],numpy.float32) ) numpy.mat([[1,2,3],[4,5,6]],numpy.float32) )

B = B = CUBLASMatrixCUBLASMatrix( ( numpy.mat([[2,3],[4,5],[6,7]],numpy.float32) ) numpy.mat([[2,3],[4,5],[6,7]],numpy.float32) )

C = A*B C = A*B printprint

C.np_matC.np_mat() ()

Page 19: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Language BindingsLanguage Bindings

Unlike Unlike OpenCLOpenCL, CUDA, CUDA--enabled enabled GPUsGPUs are only are only available from NVIDIA (available from NVIDIA (GeForceGeForce 8 series and 8 series and above, above, QuadroQuadro and and TeslaTesla). ). AMD/ATI only supports AMD/ATI only supports GPGPU GPGPU on its highon its high--end chipsend chips

Page 20: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Limitations and ChallengesLimitations and Challenges

PythonPython -- PyCUDAPyCUDAJavaJava -- jCUDAjCUDA, , JCublasJCublas, , JCufftJCufft..NET NET -- CUDA.NET CUDA.NET

Page 21: CUDA and OpenCL --- Development Interfaces for - University of Iowa

ReferencesReferences

NVIDIA Clears Water Muddied by NVIDIA Clears Water Muddied by LarrabeeLarrabee Shane Shane McGlaunMcGlaun (Blog) (Blog) -- August 5, 2008 August 5, 2008 -- DailyTechDailyTechFirst First OpenCLOpenCL demo on a GPUdemo on a GPU at at YouTubeYouTube (requires (requires Adobe FlashAdobe Flash) ) DirectComputeDirectCompute Ocean Demo Running on NVIDIA CUDAOcean Demo Running on NVIDIA CUDA--enabled GPUenabled GPU at at YouTubeYouTube (requires (requires Adobe FlashAdobe Flash) ) GiorgosGiorgos VasiliadisVasiliadis, , SpirosSpiros AntonatosAntonatos, , MichalisMichalis PolychronakisPolychronakis, , EvangelosEvangelos P. P. MarkatosMarkatosand Sotiris Ioannidis (September 2008, Boston, MA, USA). "and Sotiris Ioannidis (September 2008, Boston, MA, USA). "GnortGnort: High Performance : High Performance Network Intrusion Detection Using Graphics ProcessorsNetwork Intrusion Detection Using Graphics Processors" (PDF). " (PDF). Proceedings of the 11th Proceedings of the 11th International Symposium On Recent Advances In Intrusion DetectioInternational Symposium On Recent Advances In Intrusion Detection (RAID)n (RAID). . http://www.ics.forth.gr/dcs/Activities/papers/gnort.raid08.pdfhttp://www.ics.forth.gr/dcs/Activities/papers/gnort.raid08.pdf. . Schatz, M.C., Schatz, M.C., TrapnellTrapnell, C., , C., DelcherDelcher, A.L., , A.L., VarshneyVarshney, A. (2007). ", A. (2007). "HighHigh--throughput throughput sequence alignment using Graphics Processing Units.sequence alignment using Graphics Processing Units.". ". BMC BioinformaticsBMC Bioinformatics 8:4748:474: 474. : 474. doidoi::10.1186/147110.1186/1471--21052105--88--474474. . http://www.biomedcentral.com/1471http://www.biomedcentral.com/1471--2105/8/4742105/8/474. . ManavskiManavski, , SvetlinSvetlin A.; Giorgio Valle (2008). "A.; Giorgio Valle (2008). "CUDA compatible GPU cards as efficient CUDA compatible GPU cards as efficient hardware accelerators for Smithhardware accelerators for Smith--Waterman sequence alignmentWaterman sequence alignment". ". BMC BioinformaticsBMC Bioinformatics9(Suppl 2):S109(Suppl 2):S10: S10. : S10. doidoi::10.1186/147110.1186/1471--21052105--99--S2S2--S10S10. . http://www.biomedcentral.com/1471http://www.biomedcentral.com/1471--2105/9/S2/S102105/9/S2/S10. .

Page 22: CUDA and OpenCL --- Development Interfaces for - University of Iowa

ReferencesReferences

PyritPyrit -- Google Code Google Code http://http://code.google.com/p/pyritcode.google.com/p/pyrit//Use your NVIDIA GPU for scientific computingUse your NVIDIA GPU for scientific computing, BOINC official site , BOINC official site (December 18 2008) (December 18 2008) NVIDIA CUDA Software Development Kit (CUDA SDK) NVIDIA CUDA Software Development Kit (CUDA SDK) -- Release Notes Release Notes Version 2.0 for MAC OSXVersion 2.0 for MAC OSXCUDA 1.1 CUDA 1.1 -- Now on Mac OS XNow on Mac OS X-- (Posted on Feb 14, 2008) (Posted on Feb 14, 2008) Silberstein, Mark (2007). "Silberstein, Mark (2007). "Efficient computation of SumEfficient computation of Sum--products on products on GPUsGPUs" " (PDF). (PDF). http://http://www.technion.ac.il/~marks/docs/SumProductPaper.pdfwww.technion.ac.il/~marks/docs/SumProductPaper.pdf. . [[http://www.herikstad.net/2009/05/cudahttp://www.herikstad.net/2009/05/cuda--andand--doubledouble--precisionprecision--floating.htmlfloating.htmlCUDA and double precision floating point numbers] CUDA and double precision floating point numbers] ""CUDACUDA--Enabled ProductsEnabled Products". ". CUDA ZoneCUDA Zone. NVIDIA Corporation. . NVIDIA Corporation. http://http://www.nvidia.com/object/cuda_learn_products.htmlwww.nvidia.com/object/cuda_learn_products.html. Retrieved 2008. Retrieved 2008--1111--03. 03. CUDACUDA--Enabled GPU ProductsEnabled GPU Productshttp://http://www.nvidia.com/object/fermi_architecture.htmlwww.nvidia.com/object/fermi_architecture.html The Next The Next Generation CUDA Architecture, Code Named Fermi . Generation CUDA Architecture, Code Named Fermi .

Page 23: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Useful LinksUseful Links

NvidiaNvidia CUDACUDA official websiteofficial websiteNvidiaNvidia NexusNexus, NVIDIA. , NVIDIA. NvidiaNvidia CUDA developer registration for professional developers and resCUDA developer registration for professional developers and researchersearchersNvidiaNvidia CUDA GPU Computing developer forumsCUDA GPU Computing developer forumsA conversation with JenA conversation with Jen--HsunHsun Huang, CEO Huang, CEO NvidiaNvidia Charlie RoseCharlie Rose, February 5, 2009 , February 5, 2009 http://www.gpu4vision.orghttp://www.gpu4vision.org Scientific Publications, Videos and Software using CUDA Scientific Publications, Videos and Software using CUDA Beyond3D Beyond3D –– Introducing CUDA Introducing CUDA Nvidia'sNvidia's Vision for GPU ComputingVision for GPU Computing March 10, 2007 March 10, 2007 University of Illinois University of Illinois NvidiaNvidia CUDA CourseCUDA Course taught by taught by WenWen--meimei HwuHwu and and David KirkDavid Kirk, , Spring 2009 Spring 2009 CUDA: Breaking the Intel & AMD DominanceCUDA: Breaking the Intel & AMD DominanceA CUDAA CUDA--based Engine for MATLABbased Engine for MATLABNVidiaNVidia CUDA Tutorial SlidesCUDA Tutorial Slides (from (from DoDDoD HPCMP2009) HPCMP2009) AscalaphAscalaph Liquid GPULiquid GPU molecular dynamicsmolecular dynamics. . CUDA implementation for multiCUDA implementation for multi--core processorscore processors

Page 24: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Useful LinksUseful Links

Integrate CUDA with Visual C++Integrate CUDA with Visual C++, September 26, 2008 , September 26, 2008 CUDA.NET CUDA.NET -- .NET library for CUDA, Linux/Windows compliant.NET library for CUDA, Linux/Windows compliantUsing NVIDIA GPU for scientific computingUsing NVIDIA GPU for scientific computing with with BOINCBOINC software software CUDA.CS.MSU.SU Russian CUDA developer communityCUDA.CS.MSU.SU Russian CUDA developer communityGPUmatGPUmat: MATLAB(R) GPU Toolbox based on CUDA: MATLAB(R) GPU Toolbox based on CUDAEnable Enable IntellisenseIntellisense for CUDA in Visual Studio 2008for CUDA in Visual Studio 2008, April 29, 2009 , April 29, 2009 CUDA Tutorials for high performance computingCUDA Tutorials for high performance computingAn introduction to CUDAAn introduction to CUDA (French) (French) NVidiaNVidia CUDA Tutorial & ExamplesCUDA Tutorial & Examples (from ISC2009) (from ISC2009) GPUBrasil.comGPUBrasil.com, First website on GPGPU in Portuguese , First website on GPGPU in Portuguese [2][2], Implementation of CUDA in Cloth simulation , Implementation of CUDA in Cloth simulation DDJ CUDA seriesDDJ CUDA series, First in the Doctor Dobb's Journal series teaching CUDA , First in the Doctor Dobb's Journal series teaching CUDA (over 13 articles by Rob Farber) (over 13 articles by Rob Farber)

Page 25: CUDA and OpenCL --- Development Interfaces for - University of Iowa

OpenCLOpenCL

––

CrossCross--platform platform IntrerfaceIntrerface for Multicore Programmingfor Multicore Programming

OpenCLOpenCL ((OpenOpen CComputing omputing LLanguage)anguage)a framework for writing programs that execute across a framework for writing programs that execute across heterogeneousheterogeneous platforms consisting of platforms consisting of CPUsCPUs, , GPUsGPUs, and , and other processors. other processors. OpenCLOpenCL includes a language (based on includes a language (based on C99C99) for writing ) for writing kernelskernels (functions that execute on (functions that execute on OpenCLOpenCL devices)devices)APIs are used to define and then control the platforms. APIs are used to define and then control the platforms. OpenCLOpenCL provides provides parallel computingparallel computing using taskusing task--based and based and datadata--based parallelism.based parallelism.

Page 26: CUDA and OpenCL --- Development Interfaces for - University of Iowa

OpenCLOpenCL

––

CrossCross--platform platform IntrerfaceIntrerface for Multicore Programmingfor Multicore Programming

OpenCLOpenCL is analogous to the open industry is analogous to the open industry standards standards OpenGLOpenGL and and OpenALOpenAL, for 3D , for 3D graphics and computer audio, respectively.graphics and computer audio, respectively.OpenCLOpenCL extends the power of the GPU beyond extends the power of the GPU beyond graphics (graphics (GPGPUGPGPU). ). OpenCLOpenCL is managed by the is managed by the nonnon--profitprofittechnology technology consortiumconsortium KhronosKhronos GroupGroup..

Page 27: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

OpenCLOpenCL was initially developed by was initially developed by Apple Inc.Apple Inc., , which holds trademark rights, and refined into which holds trademark rights, and refined into an initial proposal in collaboration with technical an initial proposal in collaboration with technical teams at teams at AMDAMD, , IBMIBM, , IntelIntel, and , and NvidiaNvidia. . Apple submitted this initial proposal to the Apple submitted this initial proposal to the KhronosKhronos Group. Group.

Page 28: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

On June 16, 2008 the On June 16, 2008 the KhronosKhronos Compute Working Compute Working Group was formed with representatives from CPU, Group was formed with representatives from CPU, GPU, embeddedGPU, embedded--processor, and software companies.processor, and software companies.This group worked for five months to finish the This group worked for five months to finish the technical details of the specification for technical details of the specification for OpenCLOpenCL 1.0 by 1.0 by November 18, 2008. This technical specification was November 18, 2008. This technical specification was reviewed by the reviewed by the KhronosKhronos members and approved for members and approved for public release on December 8, 2008.public release on December 8, 2008.

Page 29: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

OpenCLOpenCL 1.0 has been released with 1.0 has been released with Mac OS X v10.6 Mac OS X v10.6 ("Snow Leopard")("Snow Leopard"). According to an Apple press release:. According to an Apple press release:Snow Leopard further extends support for modern Snow Leopard further extends support for modern hardware with Open Computing Language (hardware with Open Computing Language (OpenCLOpenCL), ), which lets any application tap into the vast gigaflops of which lets any application tap into the vast gigaflops of GPU computing power previously available only to GPU computing power previously available only to graphics applications. graphics applications. OpenCLOpenCL is based on the C programming language and is based on the C programming language and has been proposed as an open standard.has been proposed as an open standard.

Page 30: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

AMD has decided to support AMD has decided to support OpenCLOpenCL (and (and DirectXDirectX 11) instead of the now deprecated 11) instead of the now deprecated Close to MetalClose to Metal in its in its Stream frameworkStream framework..RapidMindRapidMind announced their adoption of announced their adoption of OpenCLOpenCL underneath their development underneath their development platform, in order to support platform, in order to support GPUsGPUs from from multiple vendors with one interface.multiple vendors with one interface.

Page 31: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

On December 9, 2008, On December 9, 2008, NvidiaNvidia announced its announced its intention to add full support for the intention to add full support for the OpenCLOpenCL1.0 specification to its GPU Computing Toolkit1.0 specification to its GPU Computing ToolkitOn October 30, 2009, IBM released its first On October 30, 2009, IBM released its first OpenCLOpenCL implementationimplementationOpenCLOpenCL specification is under development at specification is under development at KhronosKhronos, which is open to any interested , which is open to any interested company to join.company to join.

Page 32: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

On December 10, 2008, AMD and On December 10, 2008, AMD and NvidiaNvidia held the first public held the first public OpenCLOpenCL demonstration, a 75demonstration, a 75--minute presentation at minute presentation at SiggraphSiggraphAsia 2008Asia 2008. .

AMD showed a CPUAMD showed a CPU--accelerated accelerated OpenCLOpenCL demo explaining the scalability demo explaining the scalability of of OpenCLOpenCL on one or more cores while on one or more cores while NvidiaNvidia showed a GPUshowed a GPU--accelerated demo.accelerated demo.

On March 26, 2009, at On March 26, 2009, at GDC 2009GDC 2009, AMD and , AMD and HavokHavokdemonstrated the first working implementation for demonstrated the first working implementation for OpenCLOpenCLaccelerating accelerating HavokHavok ClothCloth on AMD on AMD RadeonRadeon HD 4000 seriesHD 4000 seriesGPU.GPU.On April 20, 2009, On April 20, 2009, NvidiaNvidia announced the release of its announced the release of its OpenCLOpenCLdriver and driver and SDKSDK to developers participating in its to developers participating in its OpenCLOpenCL Early Early Access Program.Access Program.

Page 33: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

On August 5, 2009, AMD unveiled the first development tools On August 5, 2009, AMD unveiled the first development tools for its for its OpenCLOpenCL platform as part of its ATI Stream SDK v2.0 platform as part of its ATI Stream SDK v2.0 Beta Program.Beta Program.On August 28, 2009, Apple released On August 28, 2009, Apple released Mac OS XMac OS X Snow Leopard, Snow Leopard, which contains a full implementation of which contains a full implementation of OpenCLOpenCL..OpenCLOpenCL in Snow Leopard will initially be supported onin Snow Leopard will initially be supported on

ATI ATI RadeonRadeon HD 4850, ATI HD 4850, ATI RadeonRadeon HD 4870 and NVIDIA's HD 4870 and NVIDIA's GeforceGeforce8600M GT, 8600M GT, GeForceGeForce 8800 GS, 8800 GS, GeForceGeForce 8800 GT, 8800 GT, GeForceGeForce 8800 GTS, 8800 GTS, GeforceGeforce 9400M, 9400M, GeForceGeForce 9600M GT, 9600M GT, GeForceGeForce GT 120, GT 120, GeForceGeForce GT GT 130, 130, GeForceGeForce GTX 285, GTX 285, QuadroQuadro FX 4800, and FX 4800, and QuadroQuadro FX 5600.FX 5600.

On September 28, 2009, NVIDIA released its own On September 28, 2009, NVIDIA released its own OpenCLOpenCLdrivers and SDK implementation.drivers and SDK implementation.

Page 34: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

On October 13, 2009, AMD released the fourth beta of On October 13, 2009, AMD released the fourth beta of the ATI Stream SDK 2.0, which provides a complete the ATI Stream SDK 2.0, which provides a complete OpenCLOpenCL implementation on both implementation on both R700/R800R700/R800 GPUsGPUsand and SSE3SSE3 capable CPUs. The SDK is available for both capable CPUs. The SDK is available for both Linux and Windows.Linux and Windows.On November 26, 2009, NVIDIA released drivers for On November 26, 2009, NVIDIA released drivers for OpenCLOpenCL 1.0 (rev 48).1.0 (rev 48).The Apple, The Apple, NvidiaNvidia, , RapidMindRapidMind and Mesa Gallium3D and Mesa Gallium3D implementations of implementations of OpenCLOpenCL are all based on the are all based on the LLVMLLVM Compiler technology and use the Compiler technology and use the ClangClangCompiler as its frontend.Compiler as its frontend.

Page 35: CUDA and OpenCL --- Development Interfaces for - University of Iowa

History of History of OpenCLOpenCL

DevelopmentsDevelopments

On December 10, 2009, On December 10, 2009, VIAVIA released their first released their first product supporting product supporting OpenCLOpenCL 1.0 1.0 --ChromotionHDChromotionHD 2.0 video processor included in 2.0 video processor included in VN1000 chipset.VN1000 chipset.On December 21, 2009, AMD released the On December 21, 2009, AMD released the production version of the ATI Stream SDK 2.0 production version of the ATI Stream SDK 2.0 which provides which provides OpenCLOpenCL 1.0 support for 1.0 support for R800R800GPUsGPUs and beta support for and beta support for R700R700 GPUsGPUs..

Page 36: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Example of FFTExample of FFT

// create a compute context with GPU device // create a compute context with GPU device context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL,

NULL, NULL); NULL, NULL); // create a work// create a work--queue queue queue = queue = clCreateWorkQueue(contextclCreateWorkQueue(context, NULL, NULL, 0); , NULL, NULL, 0); // allocate the buffer memory objects // allocate the buffer memory objects memobjs[0] = memobjs[0] = clCreateBuffer(contextclCreateBuffer(context, CL_MEM_READ_ONLY | , CL_MEM_READ_ONLY |

CL_MEM_COPY_HOST_PTR, CL_MEM_COPY_HOST_PTR, sizeofsizeof((floatfloat)*2*)*2*num_entriesnum_entries, , srcAsrcA););memobjs[1] = memobjs[1] = clCreateBuffer(contextclCreateBuffer(context, CL_MEM_READ_WRITE, , CL_MEM_READ_WRITE,

sizeofsizeof((floatfloat)*2*)*2*num_entriesnum_entries, NULL); , NULL); // create the compute program // create the compute program programprogram

= = clCreateProgramFromSource(contextclCreateProgramFromSource(context, 1, &fft1D_1024_kernel_src, NULL); , 1, &fft1D_1024_kernel_src, NULL);

// build the compute // build the compute program executable program executable clBuildProgramExecutable(programclBuildProgramExecutable(program, , falsefalse, NULL, NULL); , NULL, NULL);

Page 37: CUDA and OpenCL --- Development Interfaces for - University of Iowa

Example of FFTExample of FFT

// create the compute // create the compute kernel kernel kernelkernel

= = clCreateKernel(programclCreateKernel(program, "fft1D_1024"); , "fft1D_1024"); // create N// create N--D range object with workD range object with work--item dimensions item dimensions global_work_size[0] = global_work_size[0] = num_entriesnum_entries; ; local_work_size[0] = 64; local_work_size[0] = 64; range = range = clCreateNDRangeContainer(contextclCreateNDRangeContainer(context, 0, 1, , 0, 1, global_work_sizeglobal_work_size, , local_work_sizelocal_work_size); ); // set the // set the argsargs

values values clSetKernelArg(kernelclSetKernelArg(kernel, 0, (, 0, (voidvoid

*)&memobjs[0], *)&memobjs[0], sizeofsizeof(cl_mem(cl_mem), NULL); ), NULL); clSetKernelArg(kernelclSetKernelArg(kernel, 1, (, 1, (voidvoid

*)&memobjs[1], *)&memobjs[1], sizeofsizeof(cl_mem(cl_mem), NULL); ), NULL); clSetKernelArg(kernelclSetKernelArg(kernel, 2, NULL, , 2, NULL, sizeofsizeof((floatfloat)*(local_work_size[0]+1)*16, NULL); )*(local_work_size[0]+1)*16, NULL); clSetKernelArg(kernelclSetKernelArg(kernel, 3, NULL, , 3, NULL, sizeofsizeof((floatfloat)*(local_work_size[0]+1)*16, NULL); )*(local_work_size[0]+1)*16, NULL); // execute kernel // execute kernel clExecuteKernel(queueclExecuteKernel(queue, kernel, NULL, range, NULL, 0, NULL); , kernel, NULL, range, NULL, 0, NULL);

Page 38: CUDA and OpenCL --- Development Interfaces for - University of Iowa

ReferencesReferences

KhronosKhronos Group (2008Group (2008--0606--16). "16). "KhronosKhronos Launches Launches Heterogeneous Computing InitiativeHeterogeneous Computing Initiative". Press release. ". Press release. http://www.khronos.org/news/press/releases/khronos_launchhttp://www.khronos.org/news/press/releases/khronos_launches_heterogeneous_computing_initiative/es_heterogeneous_computing_initiative/.. Retrieved 2008Retrieved 2008--0606--18. 18. ""OpenCLOpenCL gets touted in Texasgets touted in Texas". MacWorld. 2008". MacWorld. 2008--1111--20. 20. http://www.macworld.com/article/136921/2008/11/opencl.hthttp://www.macworld.com/article/136921/2008/11/opencl.html?lsrc=top_2ml?lsrc=top_2. Retrieved 2009. Retrieved 2009--0606--12. 12. KhronosKhronos Group (2008Group (2008--1212--08). "08). "The The KhronosKhronos Group Releases Group Releases OpenCLOpenCL 1.0 Specification1.0 Specification". Press release. ". Press release. http://www.khronos.org/news/press/releases/the_khronos_grhttp://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/oup_releases_opencl_1.0_specification/. Retrieved 2009. Retrieved 2009--0606--12. 12. Apple Inc. (2008Apple Inc. (2008--0606--09). "09). "Apple Previews Mac OS X Snow Apple Previews Mac OS X Snow Leopard to DevelopersLeopard to Developers". Press release. ". Press release. http://www.apple.com/pr/library/2008/06/09snowleopard.hthttp://www.apple.com/pr/library/2008/06/09snowleopard.htmlml. Retrieved 2008. Retrieved 2008--0606--09. 09.

Page 39: CUDA and OpenCL --- Development Interfaces for - University of Iowa

ReferencesReferences

AMD (2008AMD (2008--0808--06). "06). "AMD Drives Adoption of Industry Standards in GPGPU AMD Drives Adoption of Industry Standards in GPGPU Software DevelopmentSoftware Development". Press release. ". Press release. http://www.amd.com/ushttp://www.amd.com/us--en/Corporate/VirtualPressRoom/0,,51_104_543~127451,00.htmlen/Corporate/VirtualPressRoom/0,,51_104_543~127451,00.html. Retrieved 2008. Retrieved 2008--0808--14. 14. ""AMD Backs AMD Backs OpenCLOpenCL, Microsoft DirectX 11, Microsoft DirectX 11". ". eWeekeWeek. 2008. 2008--0808--06. 06. http://www.eweek.com/c/a/Desktopshttp://www.eweek.com/c/a/Desktops--andand--Notebooks/AMDNotebooks/AMD--BackingBacking--OpenCLOpenCL--andand--MicrosoftMicrosoft--DirectXDirectX--11/11/. Retrieved 2008. Retrieved 2008--0808--14. 14. ""HPCWireHPCWire: : RapidMindRapidMind Embraces Open Source and Standards ProjectsEmbraces Open Source and Standards Projects". ". HPCWireHPCWire. . 20082008--1111--10. 10. http://www.hpcwire.com/topic/applications/RapidMind_Embraces_Opehttp://www.hpcwire.com/topic/applications/RapidMind_Embraces_Open_Source_an_Source_and_Standards_Projects.htmlnd_Standards_Projects.html.. Retrieved 2008Retrieved 2008--1111--11. 11. NvidiaNvidia (2008(2008--1212--09). "09). "NVIDIA Adds NVIDIA Adds OpenCLOpenCL To Its Industry Leading GPU To Its Industry Leading GPU Computing ToolkitComputing Toolkit". Press release. ". Press release. http://www.nvidia.com/object/io_1228825271885.htmlhttp://www.nvidia.com/object/io_1228825271885.html. Retrieved 2008. Retrieved 2008--1212--10. 10. ""OpenCLOpenCL Development Kit for Linux on PowerDevelopment Kit for Linux on Power". ". alphaWorksalphaWorks. 2009. 2009--1010--30. 30. http://http://www.alphaworks.ibm.com/tech/openclwww.alphaworks.ibm.com/tech/opencl. Retrieved 2009. Retrieved 2009--1010--30. 30. ""OpenCLOpenCL Demo, AMD CPUDemo, AMD CPU". 2008". 2008--1212--10. 10. http://http://www.youtube.com/watch?vwww.youtube.com/watch?v==sLv_fhQlqissLv_fhQlqis. Retrieved 2009. Retrieved 2009--0303--28. 28.

Page 40: CUDA and OpenCL --- Development Interfaces for - University of Iowa

ReferencesReferences

""OpenCLOpenCL Demo, NVIDIA GPUDemo, NVIDIA GPU". 2008". 2008--1212--10. 10. http://http://www.youtube.com/watch?vwww.youtube.com/watch?v=PJ1jydg8mLg=PJ1jydg8mLg. Retrieved 2009. Retrieved 2009--0303--28. 28. ""AMD and AMD and HavokHavok demo demo OpenCLOpenCL accelerated physicsaccelerated physics". PC Perspective. 2009". PC Perspective. 2009--0303--26. 26. http://http://www.pcper.com/comments.php?nidwww.pcper.com/comments.php?nid=6954=6954. Retrieved 2009. Retrieved 2009--0303--28. 28. ""NVIDIA Releases NVIDIA Releases OpenCLOpenCL Driver To DevelopersDriver To Developers". NVIDIA. 2009". NVIDIA. 2009--0404--20. 20. http://www.nvidia.com/object/io_1240224603372.htmlhttp://www.nvidia.com/object/io_1240224603372.html. Retrieved 2009. Retrieved 2009--0404--27. 27. ""AMD does reverse GPGPU, announces AMD does reverse GPGPU, announces OpenCLOpenCL SDK for x86SDK for x86". ". ArsArs TechnicaTechnica. 2009. 2009--0808--05. 05. http://arst.ch/5tehttp://arst.ch/5te. Retrieved 2009. Retrieved 2009--0808--06. 06. Dan Dan MorenMoren; Jason Snell (2009; Jason Snell (2009--0606--08). "08). "Live Update: WWDC 2009 KeynoteLive Update: WWDC 2009 Keynote". ". macworld.commacworld.com. MacWorld. . MacWorld. http://www.macworld.com/article/140897/2009/06/keynote.htmlhttp://www.macworld.com/article/140897/2009/06/keynote.html. Retrieved 2009. Retrieved 2009--0606--12. 12. ""Mac OS X Snow Leopard Mac OS X Snow Leopard –– Technical specifications and system requirementsTechnical specifications and system requirements". Apple Inc. ". Apple Inc. 20092009--0606--08. 08. http://http://www.apple.com/macosx/specs.htmlwww.apple.com/macosx/specs.html. Retrieved 2009. Retrieved 2009--0808--25. 25. ""ATI Stream Software Development Kit (SDK) v2.0 Beta ProgramATI Stream Software Development Kit (SDK) v2.0 Beta Program". ". http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/defahttp://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx#oneult.aspx#one.. Retrieved 2009Retrieved 2009--1010--14. 14.

Page 41: CUDA and OpenCL --- Development Interfaces for - University of Iowa

ReferencesReferences

""Apple entry on LLVM Users pageApple entry on LLVM Users page". ". http://http://llvm.org/Users.html#Applellvm.org/Users.html#Apple. Retrieved 2009. Retrieved 2009--0808--29. 29. ""NvidiaNvidia entry on LLVM Users pageentry on LLVM Users page". ". http://http://llvm.org/Users.htmlllvm.org/Users.html. Retrieved 2009. Retrieved 2009--0808--06. 06. ""RapidmindRapidmind entry on LLVM Users pageentry on LLVM Users page". ". http://http://llvm.org/Users.htmlllvm.org/Users.html. Retrieved 2009. Retrieved 2009--1010--01. 01. ""Zack Zack Rusin'sRusin's blog post about the Mesa Gallium3D blog post about the Mesa Gallium3D OpenCLOpenCLimplementationimplementation". ". http://zrusin.blogspot.com/2009/02/opencl.htmlhttp://zrusin.blogspot.com/2009/02/opencl.html. Retrieved . Retrieved 20092009--1010--01. 01. http://www.via.com.tw/en/resources/pressroom/pressrelease.jhttp://www.via.com.tw/en/resources/pressroom/pressrelease.jsp?press_release_nosp?press_release_no=4327=4327

Page 42: CUDA and OpenCL --- Development Interfaces for - University of Iowa

ReferencesReferences

""ATI Stream SDK v2.0 with ATI Stream SDK v2.0 with OpenCLOpenCL™™ 1.0 Support1.0 Support". ". http://http://developer.amd.com/gpu/ATIStreamSDK/Pages/defaultdeveloper.amd.com/gpu/ATIStreamSDK/Pages/default.aspx.aspx. Retrieved 2009. Retrieved 2009--1010--23. 23. ""OpenCLOpenCL". SIGGRAPH2008. 2008". SIGGRAPH2008. 2008--0808--14. 14. http://s08.idav.ucdavis.edu/munshihttp://s08.idav.ucdavis.edu/munshi--opencl.pdfopencl.pdf. Retrieved 2008. Retrieved 2008--0808--14. 14. ""Fitting FFT onto G80 ArchitectureFitting FFT onto G80 Architecture" (PDF). " (PDF). VasilyVasily VolkovVolkov and and Brian Brian KazianKazian, UC Berkeley CS258 project report. May 2008. , UC Berkeley CS258 project report. May 2008. http://www.cs.berkeley.edu/~kubitron/courses/cs258http://www.cs.berkeley.edu/~kubitron/courses/cs258--S08/projects/reports/project6_report.pdfS08/projects/reports/project6_report.pdf. Retrieved 2008. Retrieved 2008--1111--14. 14. ""OpenCLOpenCL on FFTon FFT". Apple. 16 Nov 2009. ". Apple. 16 Nov 2009. https://developer.apple.com/mac/library/samplecode/OpenCLhttps://developer.apple.com/mac/library/samplecode/OpenCL_FFT/index.html_FFT/index.html.. Retrieved 2009Retrieved 2009--1212--07. 07.

Page 44: CUDA and OpenCL --- Development Interfaces for - University of Iowa

LinksLinks

The Open Toolkit libraryThe Open Toolkit library crosscross--platform C# OpenGL, platform C# OpenGL, OpenALOpenALand and OpenCLOpenCL wrapper for Mono/.Net wrapper for Mono/.Net GPU Modeling and Development in GPU Modeling and Development in OpenCLOpenCLFirst public demonstration of First public demonstration of OpenCLOpenCL by NVIDIA on by NVIDIA on December 12, 2008 at December 12, 2008 at SiggraphSiggraph Asia Asia First public demonstration of First public demonstration of OpenCLOpenCL by AMD on December by AMD on December 12, 2008 at 12, 2008 at SiggraphSiggraph Asia Asia OpenCLOpenCL: What you need to know: What you need to know –– article published in article published in Macworld, August 2008 Macworld, August 2008 HPCWireHPCWire: : OpenCLOpenCL on the Fast Trackon the Fast TrackOpenCLOpenCL explainedexplained as part of a larger article on Snow Leopard as part of a larger article on Snow Leopard ((MacOSMacOS X 10.6), published at X 10.6), published at ArsTechnicaArsTechnica in August 2009. in August 2009.