introduction to opencl - tu wien

Post on 04-Apr-2022

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to OpenCL Ezio  Bartocci  

Vienna  University  of  Technology  

Overview

•  Overview of OpenCL for NVIDIA GPUs

•  API and Languages

•  Sample codes walkthrough

•  OpenCL Information and Resources

OpenCL – Open Computing Language

•  OpenCL is an Open, royalty-free C-language extension

•  It is a framework designed for parallel programming of heterogeneous systems using GPUs, CPUs, FPGA, DSP’s and other processors including embedded mobile devices

•  It was initially introduced by Apple, now is supported by NVIDIA, Intel, AMD, IBM….(that are in the OpenCL working group)

•  Managed by Khronos Group

OpenCL versions and history (1) OpenCL 1.0 (2008) •  OpenCL 1.0 has been released with Mac OS X Snow Leopard

OpenCL 1.1 (2010) •  The Khronos Group adds significant functionality for enhanced parallel

programming flexibility, functionality, and performance including:

•  New data types including 3-component vectors and additional image formats;

•  Handling commands from multiple host threads and processing buffers across multiple devices;

•  Operations on regions of a buffer including read, write and copy of 1D, 2D, or 3D rectangular regions;

•  •  Enhanced use of events to drive and control command execution; •  Additional OpenCL built-in C functions such as integer clamp, shuffle, and

asynchronous strided copies;

•  Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events.

OpenCL versions and history (2) OpenCL 1.2 (2011) •  Most notable features include:

•  Device partitioning: the ability to partition a device into sub-devices so that work assignments can be allocated to individual compute units. This is useful for reserving areas of the device to reduce latency for time-critical tasks.

•  Separate compilation and linking of objects: the functionality to compile OpenCL into

external libraries for inclusion into other programs.

•  Enhanced image support: 1.2 adds support for 1D images and 1D/2D image arrays. Furthermore, the OpenGL sharing extensions now allow for OpenGL 1D textures and 1D/2D texture arrays to be used to create OpenCL images.

•  Built-in kernels: custom devices that contain specific unique functionality are now

integrated more closely into the OpenCL framework. Kernels can be called to use specialised or non-programmable aspects of underlying hardware. Examples include video encoding/decoding and digital signal processors.

•  DirectX functionality: DX9 media surface sharing allows for efficient sharing between

OpenCL and DX9 or DXVA media surfaces. Equally, for DX11, seamless sharing between OpenCL and DX11 surfaces is enabled.

NVIDIA OpenCL Support Operative Systems

•  Windows (XP, VISTA, 8) 32/64 bits •  Linux (Ubuntu, RHEL, etc.) 32/64 bits •  Mac OSX Snow Leopard

IDE’s supported •  GCC for Linux •  Visual Studio for Windows

Drivers and JIT Compiler •  They usually are provided with GPU drivers (i.e. CUDA

drivers…)

NVIDIA SDK •  It contains examples of applications, the specification, the

programming manual and the best practices guide.

OpenCL Language & API Platform Layer API (called from the host)

•  It is an abstraction layer for diverse computational resources •  Query, select and initialize compute devices •  Create compute contexts and work-queues

Runtime API (called from the host) •  Launch compute kernels •  Set kernel execution configuration •  Manage scheduling, compute, and memory resources

OpenCL Language •  Write compute kernels that run on a compute device •  C-based cross-platform programming interface •  Subset of ISO C99 with language extensions •  Include rich set of built-in functions •  Can be compiled Just In Time(JIT) or offline

OpenCL Programming Model

OpenCL Programming Model

NDRange  –  N-­‐Dimensional  Range    N  can  be  1,  2  or  3.  it  defines  the  global  index  space  for  each  kernel  instance.    

OpenCL Programming Model Work-­‐item    •  A  single  kernel  instance  in  the  index  space.  •  Each  Work-­‐item  execute  the  same  compute  •  Kernel  but  on  different  data  •  Work-­‐items  have  unique  global  IDs  from  the  

Index  space  •  It  can  be  related  to  the  concept  of  Thread  in  

CUDA  

OpenCL Programming Model Work-­‐group    •  Work-­‐items  are  further  grouped  into  Work  Groups  •  Work-­‐group  have  a  unique  Work-­‐group  ID  •  Work  items  have  a  unique  local  ID  within  a  Work-­‐Group  •  It  can  be  related  to  the  concept  of  Block  of  Threads  in  

CUDA  

OpenCL Memory Model

……..  

Local  Memory  

Global/Constant  Memory/  Data  Cache  Compute  Device  (e.g.  GPU)  

Local  Memory  

Global  Memory  

Compute  Device  Memory  

Compute  Unit  1   Compute  Unit  N  

Work  Group  

Work-­‐Item  1   Work-­‐Item  M  

Private  Memory  

Private  Memory  

Work  Group  

Work-­‐Item  1   Work-­‐Item  M  

Private  Memory  

Private  Memory  

Private  Memory  Read/Write  access  For  Work-­‐item  only  

Local  Memory  Read/Write  access  For  enWre  Work  Group  

Constant  Memory  Read  access  For  enWre  ND-­‐range  All  work-­‐items,  all  work-­‐groups  

Global  Memory  Read/write  access  For  enWre  ND-­‐range  All  work-­‐items,  all  work-­‐groups  

Basic Program Structure

Host program •  Create memory objects associated to contexts •  Compile and create kernel program objects •  Issue commands to command-queue •  Synchronization of commands •  Clean up OpenCL resources

•  Query compute devices •  Create contexts

Compute Kernel (runs on device) •  C code with some restrictions and extensions

PLATFORM  LAYER  

RUNTIME  

OpenCL  Language  

Basic Program Structure Buffer objects

•  1D collection of objects (like C arrays) •  Scalar & Vector types, and user-defined Structures •  They are accessed via pointers in the compute kernel

Image objects •  2D or 3D texture, frame-buffer, or images •  Must be addressed through built-in functions

Sampler objects •  Describe how to sample an image in the kernel

•  Addressing modes •  Filtering modes

OpenCL Language Highlights Function qualifiers

•  “__kernel” qualifier declares a function as a kernel

Address space qualifiers •  “__global, __local, __constant, __private”

Work-item functions •  get_work_dim() •  get_global_id(), get_local_id(), get_group_id(), get_local_size()

Image functions •  Image must be accessed through built-in functions •  Reads/writes performed through sampler objects from host or defined in source

Synchronization functions •  Barriers – All work-items within a work-group must execute the barrier function

before any work-item in the work-group can continue

top related