introduction to opencl - tu wien

15
Introduction to OpenCL Ezio Bartocci Vienna University of Technology

Upload: others

Post on 04-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to OpenCL - TU Wien

Introduction to OpenCL Ezio  Bartocci  

Vienna  University  of  Technology  

Page 2: Introduction to OpenCL - TU Wien

Overview

•  Overview of OpenCL for NVIDIA GPUs

•  API and Languages

•  Sample codes walkthrough

•  OpenCL Information and Resources

Page 3: Introduction to OpenCL - TU Wien

OpenCL – Open Computing Language

•  OpenCL is an Open, royalty-free C-language extension

•  It is a framework designed for parallel programming of heterogeneous systems using GPUs, CPUs, FPGA, DSP’s and other processors including embedded mobile devices

•  It was initially introduced by Apple, now is supported by NVIDIA, Intel, AMD, IBM….(that are in the OpenCL working group)

•  Managed by Khronos Group

Page 4: Introduction to OpenCL - TU Wien

OpenCL versions and history (1) OpenCL 1.0 (2008) •  OpenCL 1.0 has been released with Mac OS X Snow Leopard

OpenCL 1.1 (2010) •  The Khronos Group adds significant functionality for enhanced parallel

programming flexibility, functionality, and performance including:

•  New data types including 3-component vectors and additional image formats;

•  Handling commands from multiple host threads and processing buffers across multiple devices;

•  Operations on regions of a buffer including read, write and copy of 1D, 2D, or 3D rectangular regions;

•  •  Enhanced use of events to drive and control command execution; •  Additional OpenCL built-in C functions such as integer clamp, shuffle, and

asynchronous strided copies;

•  Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events.

Page 5: Introduction to OpenCL - TU Wien

OpenCL versions and history (2) OpenCL 1.2 (2011) •  Most notable features include:

•  Device partitioning: the ability to partition a device into sub-devices so that work assignments can be allocated to individual compute units. This is useful for reserving areas of the device to reduce latency for time-critical tasks.

•  Separate compilation and linking of objects: the functionality to compile OpenCL into

external libraries for inclusion into other programs.

•  Enhanced image support: 1.2 adds support for 1D images and 1D/2D image arrays. Furthermore, the OpenGL sharing extensions now allow for OpenGL 1D textures and 1D/2D texture arrays to be used to create OpenCL images.

•  Built-in kernels: custom devices that contain specific unique functionality are now

integrated more closely into the OpenCL framework. Kernels can be called to use specialised or non-programmable aspects of underlying hardware. Examples include video encoding/decoding and digital signal processors.

•  DirectX functionality: DX9 media surface sharing allows for efficient sharing between

OpenCL and DX9 or DXVA media surfaces. Equally, for DX11, seamless sharing between OpenCL and DX11 surfaces is enabled.

Page 6: Introduction to OpenCL - TU Wien

NVIDIA OpenCL Support Operative Systems

•  Windows (XP, VISTA, 8) 32/64 bits •  Linux (Ubuntu, RHEL, etc.) 32/64 bits •  Mac OSX Snow Leopard

IDE’s supported •  GCC for Linux •  Visual Studio for Windows

Drivers and JIT Compiler •  They usually are provided with GPU drivers (i.e. CUDA

drivers…)

NVIDIA SDK •  It contains examples of applications, the specification, the

programming manual and the best practices guide.

Page 7: Introduction to OpenCL - TU Wien

OpenCL Language & API Platform Layer API (called from the host)

•  It is an abstraction layer for diverse computational resources •  Query, select and initialize compute devices •  Create compute contexts and work-queues

Runtime API (called from the host) •  Launch compute kernels •  Set kernel execution configuration •  Manage scheduling, compute, and memory resources

OpenCL Language •  Write compute kernels that run on a compute device •  C-based cross-platform programming interface •  Subset of ISO C99 with language extensions •  Include rich set of built-in functions •  Can be compiled Just In Time(JIT) or offline

Page 8: Introduction to OpenCL - TU Wien

OpenCL Programming Model

Page 9: Introduction to OpenCL - TU Wien

OpenCL Programming Model

NDRange  –  N-­‐Dimensional  Range    N  can  be  1,  2  or  3.  it  defines  the  global  index  space  for  each  kernel  instance.    

Page 10: Introduction to OpenCL - TU Wien

OpenCL Programming Model Work-­‐item    •  A  single  kernel  instance  in  the  index  space.  •  Each  Work-­‐item  execute  the  same  compute  •  Kernel  but  on  different  data  •  Work-­‐items  have  unique  global  IDs  from  the  

Index  space  •  It  can  be  related  to  the  concept  of  Thread  in  

CUDA  

Page 11: Introduction to OpenCL - TU Wien

OpenCL Programming Model Work-­‐group    •  Work-­‐items  are  further  grouped  into  Work  Groups  •  Work-­‐group  have  a  unique  Work-­‐group  ID  •  Work  items  have  a  unique  local  ID  within  a  Work-­‐Group  •  It  can  be  related  to  the  concept  of  Block  of  Threads  in  

CUDA  

Page 12: Introduction to OpenCL - TU Wien

OpenCL Memory Model

……..  

Local  Memory  

Global/Constant  Memory/  Data  Cache  Compute  Device  (e.g.  GPU)  

Local  Memory  

Global  Memory  

Compute  Device  Memory  

Compute  Unit  1   Compute  Unit  N  

Work  Group  

Work-­‐Item  1   Work-­‐Item  M  

Private  Memory  

Private  Memory  

Work  Group  

Work-­‐Item  1   Work-­‐Item  M  

Private  Memory  

Private  Memory  

Private  Memory  Read/Write  access  For  Work-­‐item  only  

Local  Memory  Read/Write  access  For  enWre  Work  Group  

Constant  Memory  Read  access  For  enWre  ND-­‐range  All  work-­‐items,  all  work-­‐groups  

Global  Memory  Read/write  access  For  enWre  ND-­‐range  All  work-­‐items,  all  work-­‐groups  

Page 13: Introduction to OpenCL - TU Wien

Basic Program Structure

Host program •  Create memory objects associated to contexts •  Compile and create kernel program objects •  Issue commands to command-queue •  Synchronization of commands •  Clean up OpenCL resources

•  Query compute devices •  Create contexts

Compute Kernel (runs on device) •  C code with some restrictions and extensions

PLATFORM  LAYER  

RUNTIME  

OpenCL  Language  

Page 14: Introduction to OpenCL - TU Wien

Basic Program Structure Buffer objects

•  1D collection of objects (like C arrays) •  Scalar & Vector types, and user-defined Structures •  They are accessed via pointers in the compute kernel

Image objects •  2D or 3D texture, frame-buffer, or images •  Must be addressed through built-in functions

Sampler objects •  Describe how to sample an image in the kernel

•  Addressing modes •  Filtering modes

Page 15: Introduction to OpenCL - TU Wien

OpenCL Language Highlights Function qualifiers

•  “__kernel” qualifier declares a function as a kernel

Address space qualifiers •  “__global, __local, __constant, __private”

Work-item functions •  get_work_dim() •  get_global_id(), get_local_id(), get_group_id(), get_local_size()

Image functions •  Image must be accessed through built-in functions •  Reads/writes performed through sampler objects from host or defined in source

Synchronization functions •  Barriers – All work-items within a work-group must execute the barrier function

before any work-item in the work-group can continue