cjharris gpu computing opencl
Post on 03-Apr-2018
232 Views
Preview:
TRANSCRIPT
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 1/61
Getting Started with OpenCL GPU Computing
iVEC Workshop
30th May - 1st June 2012
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 2/61
Open Compute Language (OpenCL)
OpenCL is the first open, royalty-free standard for cross-platform, parallel programming of modern processors
found in personal computers, servers andhandheld/embedded devices.
Participating companies and institutions:
OpenCL is being created by the Khronos Group:
3DLABS, Activision Blizzard, AMD, Apple, ARM, Broadcom, Codeplay, Electronic Arts,Ericsson, Freescale, Fujitsu, GE, Graphic Remedy, HI, IBM, Intel, Imagination
Technologies, Los Alamos National Laboratory, Motorola, Movidius, Nokia, NVIDIA,Petapath, QNX, Qualcomm, RapidMind, Samsung, Seaweed, S3, ST Microelectronics,
Takumi, Texas Instruments, Toshiba and Vivante.
http://www.khronos.org/opencl/
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 3/61
How is OpenCL different from CUDA?
core GPU computingon NVIDIA hardware
AMD implementation
on AMD CPU/GPUand Intel CPU
Intel implementationon Intel CPU
IBM implementationon Intel/AMD/NVIDIA/Power
Intel implementationon Intel MIPS
optimised librariesfor NVIDIA hardware
better marketing
slightly simpler API
more readily availabledocumentation
portable, but not necessarilyoptimised code
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 4/61
OpenCL Platforms
Platform:
A host plus a collection of devices managed by the OpenCLframework that allow an application to share resources andexecute kernels on devices in the platform.
Host
Device Device Device
Platform
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 5/61
OpenCL Platforms : clGetPlatformIDs
cl_int clGetPlatformIDs( cl_uint num_entries,cl_platform_id* platforms,cl_uint* num_platforms)
num_entries : capacity of platform IDs in memory pointed to by platforms platforms : pointer to memory to store returned platform IDsnum_platforms : returns the actual number of platforms IDs available
Either CL_SUCCESS or an error code.
The following routine is used to query the number of
OpenCL platforms, and their corresonding IDs:
Arguments
Returns
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 6/61
OpenCL Platforms : clGetPlatformIDs
#include <stdio.h>#include <CL/cl.h>
int main(int argc, char** argv){
// determine number of platformscl_int clErr;
cl_uint num_platforms;clErr = clGetPlatformIDs(0,NULL,&num_platforms);checkErr(clErr,__FILE__,__LINE__);printf("OpenCL Platforms found: %i\n",num_platforms);if(num_platforms<1) { exit(0); }
// get platform IDscl_platform_id platforms[num_platforms];clErr = clGetPlatformIDs(num_platforms,platforms,NULL);checkErr(clErr,__FILE__,__LINE__);
return 0;}
Code Example:
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 7/61
OpenCL Errors
void checkErr(cl_int clErr, char* filename, int line){
if (clErr!=CL_SUCCESS){
printf("OpenCL Error %i at line %i of%s\n",clErr,line,filename);exit(EXIT_FAILURE);
}}
You can find the error codes in cl.h :
/* Error Codes */#define CL_SUCCESS 0#define CL_DEVICE_NOT_FOUND -1#define CL_DEVICE_NOT_AVAILABLE -2#define CL_COMPILER_NOT_AVAILABLE -3#define CL_MEM_OBJECT_ALLOCATION_FAILURE -4#define CL_OUT_OF_RESOURCES -5#define CL_OUT_OF_HOST_MEMORY -6#define CL_PROFILING_INFO_NOT_AVAILABLE -7#define CL_MEM_COPY_OVERLAP -8#define CL_IMAGE_FORMAT_MISMATCH -9#define CL_IMAGE_FORMAT_NOT_SUPPORTED -10#define CL_BUILD_PROGRAM_FAILURE -11
#define CL_MAP_FAILURE -12
...
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 8/61
Where is OpenCL on Fornax?
NVIDIA Implementation (for NVIDIA GPU)
AMD Implementation (for Intel CPU)
Intel Implementation (for Intel CPU)
module load cuda /opt/centos6.1-modules/cuda/4.1.28/cuda/include/CL/cl.h/opt/nodes.updates/login.cuda.lib/lib64/libOpenCL.so/opt/nodes.updates/login.cuda.lib/lib/libOpenCL.so
module load AMDAPP/opt/centos6.1-modules/AMDAPP/2.5/include/CL/cl.h/opt/centos6.1-modules/AMDAPP/2.5/lib/x86_64/libOpenCL.so/opt/centos6.1-modules/AMDAPP/2.5/lib/x86/libOpenCL.so
not installed - hasn't been requested
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 9/61
Compiling OpenCL on Fornax (NVIDIA)
Load required modules if necessary:
module load gccmodule load cuda
Command line compile:
gcc platform_id.c -o platform_id -lOpenCL
Better to use a Makefile:
default:gcc platform_id.c -o platform_id -lOpenCL
And compile with make:
make
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 10/61
Running OpenCL on Fornax (NVIDIA)
Change to scratch directory:
cd /scratch/ projectname/username/programpath
One node PBS script subPlatformID:
#!/bin/bash#PBS -W group_list= projectname #PBS -q workq#PBS -l walltime=00:10:00
#PBS -l select=1:ncpus=1:ngpus=1:mem=64gb#PBS -l place=excl
module load cuda
cd /scratch/ projectname/username/programpath/home/username/ programpath/platform_id
Submit with qsub:
qsub subPlatformID
Check queue, directory for output:
qstatlscat subPlatformID.oXXXX subPlatformID.eXXXX
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 11/61
OpenCL Platforms : clGetPlatformInfo
cl_int clGetPlatformInfo( cl_platform_id platform,cl_platform_info param_name,size_t param_value_size,void* param_value,size_t param_value_size_ret )
platform : the platform being queried param_name : CL_PLATFORM_PROFILE, CL_PLATFORM_VERSION param_value_size : size of memory pointed to by param_value param_value : pointer to memory to store return value param_value_size_ret : returns the size in bytes of data being queried
Either CL_SUCCESS or an error code.
The following OpenCL routine is used to query platforms:
Arguments
Returns
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 12/61
OpenCL Platforms : clGetPlatformInfo
...
// get platform infoint i;for (i=0; i<num_platforms; i++){
size_t size;
clErr = clGetPlatformInfo(platforms[i],CL_PLATFORM_VENDOR,0,NULL,&size);checkErr(clErr,__FILE__,__LINE__);char vendor[size];clErr = clGetPlatformInfo(platforms[i],CL_PLATFORM_VENDOR,size,vendor,NULL);checkErr(clErr,__FILE__,__LINE__);printf("Platform %i: %s\n",i,vendor);
}
...
Code Example:
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 13/61
OpenCL Programming Task : Platform Query
Write a program that prints out:
- the number of OpenCL platforms- the names of the OpenCL platforms
You can find a template in:
/scratch/courses01/templates/opencl_platform.c
You may find the following function definitions useful:
cl_int clGetPlatformInfo( cl_platform_id platform,cl_platform_info param_name,size_t param_value_size,void* param_value,size_t param_value_size_ret )
cl_int clGetPlatformIDs( cl_uint num_entries,cl_platform_id* platforms,cl_uint* num_platforms)
param_name : CL_PLATFORM_PROFILE, CL_PLATFORM_VERSION
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 14/61
OpenCL Devices
Device:
An OpenCL device consists of a global memory and a numberof compute units, each in turn containing a number of
processing elements and a local memory .
GlobalMemory
Compute UnitDevice
PE PE PE PE
PE PE PE PE
Compute Unit
PE PE PE PE
PE PE PE PE
Compute Unit
PE PE PE PE
PE PE PE PE
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 15/61
OpenCL Devices : clGetDeviceIDs
cl_int clGetDeviceIDs( cl_platform_id platform,cl_device_type device_type,cl_uint num_entries,cl_device_id* devices,size_uint* num_devices)
platform : platform ID of desired platformdevice_type : CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU, etcnum_entries : size of pointer allocationdevices : pointer to return device IDsnum_devices : pointer to return number of devices
Either CL_SUCCESS or an error code.
The following OpenCL routine is used obtain the number ofdevices, and their Ids, available in a platform:
Arguments
Returns
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 16/61
OpenCL Devices : clGetDeviceIDs
...
// get number of devicescl_uint num_devices;clErr = clGetDeviceIDs(platform,CL_DEVICE_TYPE_GPU,0,NULL,&num_devices);checkErr(clErr,__FILE__,__LINE__);
printf("\nOpenCL GPU Devices found: %i\n",num_devices);
// get device IDscl_device_id devices[num_devices];clErr = clGetDeviceIDs(platform,CL_DEVICE_TYPE_GPU,num_devices,devices,NULL);checkErr(clErr,__FILE__,__LINE__);
...
Code Example:
O CL D i lG D i I f
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 17/61
OpenCL Devices : clGetDeviceInfo
cl_int clGetDeviceInfo( cl_device_id device,cl_device_info param_name,size_t param_value_size,void* param_value,size_t param_value_size_ret )
device : the device to query param_name : CL_DEVICE_NAME, and many more param_value_size : size of memory pointed to by param_value param_value : pointer to memory to store return value param_value_size_ret : returns the size in bytes of data being queried
Either CL_SUCCESS or an error code.
The following OpenCL routine is used to query devices:
Arguments
Returns
O CL D i lG tD i I f
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 18/61
OpenCL Devices : clGetDeviceInfo
CL_DEVICE_TYPECL_DEVICE_VENDOR_IDCL_DEVICE_MAX_COMPUTE_UNITSCL_DEVICE_MAX_WORK_ITEM_DIMENSIONSCL_DEVICE_MAX_WORK_ITEM_SIZESCL_DEVICE_MAX_WORK_GROUP_SIZECL_DEVICE_PREFERRED_VECTOR_WIDTH_CHARCL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORTCL_DEVICE_PREFERRED_VECTOR_WIDTH_INT
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONGCL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLECL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOATCL_DEVICE_PREFERRED_VECTOR_WIDTH_HALFCL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLECL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF
...
There are a long list of device properties, they are listed inthe OpenCL specification document:
O CL D i lG tD i I f
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 19/61
OpenCL Devices : clGetDeviceInfo
...
// get device infofor (i=0; i<num_devices; i++){
size_t size;clErr = clGetDeviceInfo(devices[i],CL_DEVICE_NAME,0,NULL,&size);
checkErr(clErr,__FILE__,__LINE__);char name[size];clErr = clGetDeviceInfo(devices[i],CL_DEVICE_NAME,size,name,NULL);checkErr(clErr,__FILE__,__LINE__);printf("\tDevice %i: %s\n",i,name);
}
...
Code Example:
OpenCL Programming Task Device Query
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 20/61
OpenCL Programming Task : Device Query
Write a program that prints out:- the names of the devices in the platform
You can find a template in:
/scratch/courses01/templates/opencl_device.c
You may find the following function definitions useful:
cl_int clGetDeviceIDs( cl_platform_id platform,cl_device_type device_type,cl_uint num_entries,cl_device_id* devices,size_uint* num_devices)
device_type : CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU, etccl_int clGetDeviceInfo( cl_device_id device,
cl_device_info param_name,size_t param_value_size,void* param_value,size_t param_value_size_ret )
param_name : CL_DEVICE_NAME, etc
OpenCL Context
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 21/61
OpenCL Context
Context:
An OpenCL context are a collection of OpenCL concepts thatare associated with a group of devices, including Command Queues, Device Buffers, Programs and Kernels.
Context
Device
Device
DeviceBuffer
DeviceBuffer
DeviceBuffer
Program
Kernel
Kernel
CommandQueue
OpenCL Context : clCreateContext
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 22/61
OpenCL Context : clCreateContext
cl_context clCreateContext ( cl_context_properties *properties,cl_uint num_devices,cl_device_id *devices,void (CL_CALLBACK *pfn_notify)
(const char *errinfo,const void *private_info, size_t cb,void *user_data),
void *user_data,cl_int *errcode_ret)
The following OpenCL routine is used to create contexts:
Arguments
Returns
properties : the desired properties of the context (more next slide)num_devices : the number of devices in the context
devices : pointer to a list of IDs of the desired devices pfn_notify : pointer to callback functionuser_data : pointer to user defined data to be returned by the callbackerrcode_ret : pointer to value to return error code.
The requested OpenCL context, assuming no errors were returned.
OpenCL Context Properties
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 23/61
OpenCL Context Properties
…
// define desired context properties listcl_context_properties properties[] = {CL_CONTEXT_PLATFORM,
(cl_context_properties) platform,
0};
// create contextcl_context context = clCreateContext(properties,1,&device,NULL,NULL,&clErr);checkErr(clErr,__FILE__,__LINE__);
...
The cl_context_properties type is a zero terminated list of
Context properties and their desired values.
As a minimum, the corresponding platform should beprovided:
OpenCL Context : clReleaseContext
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 24/61
OpenCL Context : clReleaseContext
When a context is no longer required, it should be released:
Arguments
Returns
context : the context to release
cl_int clReleaseContext (cl_context context)
Either CL_SUCCESS or an error code
Example Code:
...
// release contextclErr = clReleaseContext(context);checkErr(clErr,__FILE__,__LINE__);
...
OpenCL Programming Task : Context
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 25/61
OpenCL Programming Task : Context
Write a program that:- creates an OpenCL context
You can find a template in:
/scratch/courses01/templates/opencl_context.c
You may find the following function definitions useful:
cl_context clCreateContext ( cl_context_properties *properties,cl_uint num_devices,cl_device_id *devices,void (CL_CALLBACK *pfn_notify)
(const char *errinfo,const void *private_info, size_t cb,void *user_data),void *user_data,cl_int *errcode_ret)
cl_context_properties properties[] = {CL_CONTEXT_PLATFORM,(cl_context_properties) platform,0};
OpenCL Command Queue
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 26/61
OpenCL Command Queue
Command Queue:
An OpenCL command queues provide a mechanism to queue
commands that operate on the various objects of a context.
The command queue can either act as a simple First In First Out(FIFO) queue, or use events to create command dependencies.
Context
Device
Device
DeviceBuffer
DeviceBuffer
DeviceBuffer
Program
Kernel
Kernel
CommandQueue
OpenCL Command Queue : clCreateCommandQueue
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 27/61
OpenCL Command Queue : clCreateCommandQueue
cl_command_queue clCreateCommandQueue (cl_context context,cl_device_id device,cl_command_queue_properties properties,cl_int* errcode_ret)
The following OpenCL routine is used to create queues:
Arguments
Returns
context : the context for the command queuedevice : the device that is the target of the commands
properties : CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, etcerrcode_ret : pointer to value to return an error code
The requested OpenCL command queue
OpenCL Command Queue : clReleaseCommandQueue
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 28/61
OpenCL Command Queue : clReleaseCommandQueue
cl_int clReleaseCommandQueue (cl_command_queue command_queue)
The following OpenCL routine is used to release queues:
Arguments
Returns
command_queue : the queue to release
Either CL_SUCCESS or an error code
OpenCL Command Queue
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 29/61
OpenCL Command Queue
...
// create command queuecl_command_queue queue = clCreateCommandQueue(context,device,0,&clErr);checkErr(clErr,__FILE__,__LINE__);
...
// release command queueclErr = clReleaseCommandQueue(queue);checkErr(clErr,__FILE__,__LINE__);
...
Code Example:
OpenCL Programming Task : Command Queue
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 30/61
p g g Q
Write a program that:
- creates and releases a command queue
You can find a template in:
/scratch/courses01/templates/opencl_queue.c
You may find the following function definitions useful:cl_command_queue clCreateCommandQueue (cl_context context,
cl_device_id device,cl_command_queue_properties properties,cl_int* errcode_ret)
cl_int clReleaseCommandQueue (cl_command_queue command_queue)
OpenCL Buffers
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 31/61
p
Buffer:
An OpenCL buffer is a memory object that resides in device globalmemory. There are also many other types of memory objects, thatsupport various data structures.
Buffers are attached to contexts and are associated with devices.
GlobalMemory
Compute UnitDevice
PE PE PE PE
PE PE PE PE
Compute Unit
PE PE PE PE
PE PE PE PE
Compute Unit
PE PE PE PE
PE PE PE PE
DeviceBuffer
DeviceBuffer
OpenCL Buffers : clCreateBuffer
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 32/61
p
cl_mem clCreateBuffer ( cl_context context,cl_mem_flags flags,size_t size,void *host_ptr,cl_int *errcode_ret)
The following OpenCL routine is used to create buffers:
Arguments
Returns
context : the context for the buffercl_mem_flags : CL_MEM_READ_WRITE, CL_MEM_READ_ONLY, etcsize : size of the buffer in byteshost_pointer : pointer to host memory to populate buffer (optional)
errcode_ret : pointer to value to return an error code
The requested OpenCL buffer, as a cl_mem object.
OpenCL Buffers : clReleaseMemObject
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 33/61
p j
cl_int clReleaseMemObject (cl_mem memobject)
The following OpenCL routine is used to release buffers:
Arguments
Returns
memobject : the memobject to release
Either CL_SUCCESS or an error code
OpenCL Buffers : clWriteBuffer
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 34/61
cl_int clEnqueueWriteBuffer ( cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_write,size_t offset,size_t size,const void *ptr,cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)
The following OpenCL routine is used to write data fromhost memory into a buffer:
Arguments
Returns
command_queue : the queue to enqueue the write tobuffer : the buffer to write toblocking_write : whether this function blocks until the transfer is completeoffset : how far into the buffer to begin writing
size : the size of the transfer in bytes ptr : the location in host memory of the datanum_events_in_wait_list : number of events the write is dependent onevent_wait_list : list of events the write is dependent onevent : returns an event corresponding to this write
Either CL_SUCCESS or an error code
OpenCL Buffers : clReadBuffer
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 35/61
cl_int clEnqueueReadBuffer ( cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_read,size_t offset,size_t size,const void *ptr,cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)
The following OpenCL routine is used to read data fromhost memory into a buffer:
Arguments
Returns
command_queue : the queue to enqueue the read tobuffer : the buffer to read fromblocking_read : whether this function blocks until the transfer is complete (CL_TRUE/FALSE)offset : how far into the buffer to begin reading
size : the size of the transfer in bytes ptr : the location in host memory to put the datanum_events_in_wait_list : number of events the read is dependent onevent_wait_list : list of events the read is dependent onevent : returns an event corresponding to this read
Either CL_SUCCESS or an error code
OpenCL Buffers
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 36/61
...
// create device buffercl_mem device_values = clCreateBuffer(context,CL_MEM_READ_WRITE,bsize,
NULL,&clErr);checkErr(clErr,__FILE__,__LINE__);
// write image to device bufferclErr = clEnqueueWriteBuffer(queue,device_values,CL_TRUE,0,bsize,(void*)host_values,0,NULL,NULL);
checkErr(clErr,__FILE__,__LINE__);
// read image from device bufferclErr = clEnqueueReadBuffer(queue,device_values,CL_TRUE,0,bsize,
(void*)host_values,0,NULL,NULL);checkErr(clErr,__FILE__,__LINE__);
// release device bufferclErr = clReleaseMemObject(device_values);checkErr(clErr,__FILE__,__LINE__);
...
Code Example:
OpenCL Programming Task : Buffers
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 37/61
Write a program that:- creates a two arrays on the host, and populates one
- writes the populated array to a device buffer- reads the device buffer to the other array on the host
/scratch/courses01/templates/opencl_buffers.c
You may find the following function definitions useful:
cl_int clEnqueueWriteBuffer ( cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_write,size_t offset,size_t size,const void *ptr,cl_uint num_events_in_wait_list,
const cl_event *event_wait_list,cl_event *event)
cl_mem clCreateBuffer ( cl_context context,cl_mem_flags flags,size_t size,void *host_ptr,cl_int *errcode_ret)
You can find a template in:
OpenCL Programs
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 38/61
Program:
An OpenCL program is a set of kernel sources written as functions
defined with the __kernel qualifier, and binaries compiled for specificdevice architectures.
Context
Device
Device
DeviceBuffer
DeviceBuffer
DeviceBuffer
Program
Kernel
Kernel
CommandQueue
OpenCL Programs : clCreateProgramWithSource
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 39/61
cl_program clCreateProgramWithSource ( cl_context context,cl_uint count,const char** strings,const size_t* lengths,cl_int* errcode_ret)
The following OpenCL routine is used to create programs:
Arguments
Returns
context : the context for the programcount : the number of strings containing the sourcestrings : pointer to array of pointers to the stringslengths : pointer to array of the string lengths (NULL if \0 terminated)
errcode_ret : pointer to value to return an error code
The requested OpenCL program
OpenCL Programs : clCreateProgramWithSource
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 40/61
Options for program kernel source code:
1) Include the kernels as strings in the host source file- have to code within quotes- need to recompile to change kernel source- guaranteed to have kernel source
2) Read in the kernels from files at runtime- can code normally- can change kernels without recompiling- need to ensure path to files is correct
OpenCL Programs : clCreateProgramWithSource
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 41/61
Kernel Source String Example:
const char* source ="__kernel void zeroValues(__global int* values, int imax)\n""{\n"" // thread index and total\n"" int idx = get_global_id(0);\n"
" int idtotal = get_global_size(0);\n""\n"" // zero values\n"" int i;\n"" for(i=idx;i<imax;i+=idtotal)\n"" {\n"" values[i] = 0;\n"" }\n""}\n\0";
OpenCL Programs : clCreateProgramWithSource
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 42/61
Kernel Source File Example:
__kernel void zeroValues(__global int* values, int imax){
// thread index and totalint idx = get_global_id(0);
int idtotal = get_global_size(0);
// zero valuesint i;for(i=idx;i<imax;i+=idtotal){
values[i] = 0;}
}
OpenCL Programs : clBuildProgram
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 43/61
cl_int clBuildProgram (cl_program program,cl_uint num_devices,const cl_device_id* device_list,const char* options,
void (CL_CALLBACK *pfn_notify) (cl_program program,void *user_data),void* user_data)
Use clBuildProgram to compile and link the kernel source:
Arguments
Returns
program : the program to buildnum_devices : the number of devices target
device_list : list of devices to targetoptions : compiler flags pfn_notify : pointer to callback when done (blocking call if NULL)user_data : data to provide in callback
CL_SUCCESS or an error code
OpenCL Programs : clGetProgramBuildInfo
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 44/61
cl_int clGetProgramBuildInfo (cl_program program,cl_device_id device,cl_program_build_info param_name,size_t param_value_size,void* param_value,
size_t* param_value_size_ret)
Use clGetProgramBuildInfo to get the compiler log:
Arguments
Returns
program : the program that was builtdevice : the device that the kernels were compiled for
param_name : CL_PROGRAM_BUILD_LOG, etc
param_value_size : size of memory pointed to by param_value param_value : pointer to memory to store return value param_value_size_ret : returns the size in bytes of data being queried
CL_SUCCESS or an error code
OpenCL Programs : clCreateKernels
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 45/61
cl_kernel clCreateKernel ( cl_program program,const char* kernel_name,cl_int* errcode_ret)
Use clCreateKernels to define the kernel entry points:
Arguments
Returns
program : the program that was builtkernel_name : the name of the kernel functionerrcode_ret : pointer to value to return an error code
The OpenCL kernel corresponding to the kernel name
OpenCL Programs : clReleaseProgram, clReleaseKernel
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 46/61
cl_int clReleaseKernel (cl_kernel kernel)
Use clReleaseKernel to release the kernel:
Arguments
Returns
kernel : the kernel to release program : the program to release
Either CL_SUCCESS or an error code
cl_int clReleaseProgram (cl_program program)
Use clReleaseProgram to release the program:
OpenCL Programs and Kernels
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 47/61
// create program from sourcecl_program program = clCreateProgramWithSource(context,1,&source,NULL,&clErr);checkErr(clErr,__FILE__,__LINE__);
// compile programclErr = clBuildProgram(program, 1, &device,"",NULL,NULL);checkErr(clErr,__FILE__,__LINE__);
// print build log
clErr = clGetProgramBuildInfo(program,device,CL_PROGRAM_BUILD_LOG,0,NULL,&size);checkErr(clErr,__FILE__,__LINE__);char build_log[size];clErr = clGetProgramBuildInfo(program,device,CL_PROGRAM_BUILD_LOG,size,build_log,NULL);checkErr(clErr,__FILE__,__LINE__);printf("\nBuild Log:\n\n%s\n\n",build_log);
// create kernelcl_kernel kernel = clCreateKernel(program,"invertValues",&clErr);
checkErr(clErr,__FILE__,__LINE__);
// release kernelclErr = clReleaseKernel(kernel);checkErr(clErr,__FILE__,__LINE__);
// release programclErr = clReleaseProgram(program);checkErr(clErr,__FILE__,__LINE__);
Code Example:
OpenCL Programming Task : Programs and Kernels
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 48/61
Write and build a kernel that would:- invert and array of integers valued 0-255
/scratch/courses01/templates/opencl_program.c
You may find the following function definitions useful:
You can find a template in:
cl_program clCreateProgramWithSource ( cl_context context,cl_uint count,
const char** strings,const size_t* lengths,cl_int* errcode_ret)
cl_kernel clCreateKernel ( cl_program program,const char* kernel_name,
cl_int* errcode_ret)
cl_int clBuildProgram (cl_program program,cl_uint num_devices,
const cl_device_id* device_list,const char* options,void (CL_CALLBACK *pfn_notify) (cl_program program,void *user_data),void* user_data)
OpenCL Kernel Execution
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 49/61
To execute the kernel on the device, we must
1) Set the Kernel Arguments
2) Determine the Thread Topology (NDRange)
3) Enqueue the Kernel Execution
Context
Device
Device
DeviceBuffer
DeviceBuffer
DeviceBuffer
Program
Kernel
Kernel
CommandQueue
OpenCL Kernels : clSetKernelArg
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 50/61
cl_int clSetKernelArg ( cl_kernel kernel,cl_uint arg_index,size_t arg_size,const void* arg_value)
Use clSetKernelArg to specify the kernel arguments:
Arguments
Returns
kernel : the kernel the argument belongs toarg_index : the index of the argumentarg_size : the size of the argumentarg_value : a pointer to the value of the argument
CL_SUCCESS or an error code
OpenCL Setting Kernel Arguments
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 51/61
...
int imax = 1024;
...
// create device buffer
cl_mem device_values = clCreateBuffer ...checkErr(clErr,__FILE__,__LINE__);
...
// set kernel argumentsclErr = clSetKernelArg(kernel,0,sizeof(cl_mem),&device_values);
checkErr(clErr,__FILE__,__LINE__);clErr = clSetKernelArg(kernel,1,sizeof(int),&imax);checkErr(clErr,__FILE__,__LINE__);
...
Code Example:
OpenCL Thread Topology
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 52/61
OpenCL uses a scalable programming model that uses a NDRange of multiple worgroups that contain the
workitems that will execute on the device.
NDRange
Workgroup 0
W0 W1 W2 W3
Workgroup 1 Workgroup 2
Each workitem has access to functions that return thedimensions of the NDRange and Workgroupd, as well asits index within them.
uint get_work_dim()size_t get_global_size(uint d)size_t get_global_id(uint d)size_t get_local_size(uint d)size_t get_local_id(uint d)
W0 W1 W2 W3 W0 W1 W2 W3
OpenCL Thread Topology
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 53/61
The NDRange is divided into workgroups so that they canbe dynamically allocated to the compute units.
Multithreaded OpenCL Program
WG 0 WG 1 WG 2 WG 3 WG 4 WG 5 WG 6 WG 7
2x compute unit device
CU 0 CU 1
WG 0 WG 1
WG 2 WG 3
WG 4 WG 5
WG 6 WG 7
4x compute unit device
CU 0 CU 1
WG 0 WG 1 WG 2 WG 3
WG 4 WG 5 WG 6 WG 7
CU 0 CU 1
OpenCL Thread Topology Implications
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 54/61
The workgroup size must consider themultiprocessor architecture, with some
consideration for future changes.
CU 0
WG 0
WG 1
Just consider a few workgroupsrunning on a single compute unit.
What does the workgroup sizeeffect?
CUDA Thread Topology Implications
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 55/61
The major consideration in choosing theNDRange size is the number of compute units,
with some consideration for future changes.
Just consider all workgroupsrunning on all the multiprocessor.
What does the number of workgroups in the NDRange effect?
4x compute unit device
CU 0 CU 1
WG 0 WG 1 WG 2 WG 3
WG 4 WG 5 WG 6 WG 7
CU 0 CU 1
OpenCL Kernels : clEnqueueNDRangeKernel
l l
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 56/61
cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue,
cl_kernel kernel,cl_uint work_dim,const size_t * global_work_offset,const size_t * global_work_size,const size_t * local_work_size,cl_uint num_events_in_wait_list,const cl_event * event_wait_list,
cl_event * event)
Use clEnqueueNDRangeKernel to queue the kernel:
Arguments
Returns
command_queue : the queue to submit the kernel tokernel : the kernel to submitwork_dim : the dimensions of the thread topologyglobal_work_offset : a pointer to an array of offsets to the global indicesglobal_work_size : a pointer to an array of sizes of the global NDRange
local_work_size : a pointer to an array of sizes of the local workgroupnum_events_in_wait_list : number of events the kernel exectution is dependent onevent_wait_list : list of events the kernel execution is dependent onevent : returns an event corresponding to this kernel execution
CL_SUCCESS or an error code
OpenCL Kernel Execution
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 57/61
...
// enqueue kernelcl_uint dim = 1;size_t offset = 0;size_t local_size = 128;size_t global_size = 4*14*local_size;clErr = clEnqueueNDRangeKernel(queue,kernel,dim,&offset,&global_size,&local_size,
0,NULL,NULL);
checkErr(clErr,__FILE__,__LINE__);
...
Code Example:
OpenCL Programming Task : Invert Kernel
W it th t
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 58/61
Write a program that:- generates an array of at least a thousand values,
between 0 and 255
- print the first few values of the array- inverts the array on the GPU (subtract values from 256)- and prints the first few new values of the array
You can find template files at:
/scratch/courses01/templates/opencl_inverse.cYou may find the following definitions useful:
cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue,cl_kernel kernel,
cl_uint work_dim,const size_t * global_work_offset,const size_t * global_work_size,const size_t * local_work_size,cl_uint num_events_in_wait_list,const cl_event * event_wait_list,cl_event * event)
OpenCL Programming Task : Sum Kernel
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 59/61
Write a program that:- generates an array of at least a million values,
between 0 and 255- sums the array using a loop on the CPU- sums the array using the GPU- prints the two results
Copy your invert code as a starting point.
Hints:- each workitem can add some numbers together- you can synchronize workitems by stopping the kernel
- you may need more than one device buffer allocation- if your array is large enough, you may need to consider
numerical precision.
Further OpenCL Concepts
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 60/61
C-extensions in the kernel language for vectorsLocal memory for workitem communication within workgroups
Workgroup and device-level synchronisationCoalescing global memory accessBranching issuesMemory stalls and arithmetic intensityOverlaping kernels with host-device transfersPinned memory host-device transfers
Managing compute locality in algorithmsGraphical data-types and hardware accelerationGraphics API interoperabilityMode switchingUsing OpenCL events
7/28/2019 Cjharris Gpu Computing Opencl
http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 61/61
top related