productive opencl programming an introduction to opencl libraries with arrayfire coo oded green

50
An Introduction to OpenCL Libraries Productive OpenCL Programming

Upload: amd-developer-central

Post on 13-Jan-2015

737 views

Category:

Technology


8 download

DESCRIPTION

In this webinar presentation, ArrayFire COO Oded Green demonstrates best practices to help you quickly get started with OpenCL™ programming. Learn how to get the best performance from AMD hardware in various programming languages using ArrayFire. Oded discusses the latest advancements in the OpenCL™ ecosystem, including cutting edge OpenCL™ libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. Examples are shown in real code for common application domains. Watch the webinar here: http://bit.ly/1obT0M2 For more developer resources, visit: http://arrayfire.com/ http://developer.amd.com/ Follow us on Twitter: https://twitter.com/AMDDevCentral See info in the slides for more contact information and resource links!

TRANSCRIPT

Page 1: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

An Introduction to OpenCL Libraries

Productive OpenCL Programming

Page 2: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

● We make code run faster○ Started in 2007 by Georgia Tech researchers○ 1000s of paying customers

Page 3: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

● We build an acceleration library○ for really cool science, engineering, and finance applications○ for mobile computing

Page 4: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Libraries are Great!

Page 5: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Eliminate Hidden Costs

Page 6: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Library Types

● Specialized GPU Libs○ Targeted at a specific set of operators (functionality) ○ Optimized for specific systems○ C-like interface○ Raw pointer interface

● General GPU Libs○ Manage GPU resources using containers○ Applicable to a large set of applications and domains○ Portable across multiple architectures○ Higher level functions○ C++ interface (supports templates)

Page 7: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Specialized GPU Libraries

● Fast Fourier Transforms○ clFFT

● Random Number Generation○ Random123

● Linear Algebra○ clBLAS○ MAGMA

● Signal and Image Processing○ OpenCLIPP

Page 8: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Specialized GPU Libraries

● C Interface○ Use pointers to reference data

● Memory management is programmer responsibility● Mimic existing libraries

○ clBLAS ≈ BLAS○ MAGMA ≈ BLAS + LAPACK○ clFFT ≈ FFTW

● Simplifies GPU integration of specialized scientific libraries○ Still requires setting up the GPU

Page 9: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

clFFT

● 1D, 2D and 3D transforms● CPU and GPU backends● Supports

○ Real and complex data types○ Single and double-precision ○ Execution of multiple transformations concurrently

Page 10: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Random123

● Counter-based RNG● Passed SmallCrush, Crush and BigCrush tests● Four RNG families

○ Threefry○ Philox○ AESNI○ ARS

● Not suitable for cryptography

Page 11: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Magma & clBLAS

● Implements many popular linear algebra routines● Supports

○ Real and complex data types ○ Single and double-precision

Page 12: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

OpenCLIPP

● Supports multiple image types● Similar to Intel IPP● Primitives

○ Arithmetic and logic○ LUT○ Morphology○ Transform○ Resize○ Histogram○ Many more…

● C and C++ interface

Page 13: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

General-Purpose GPU Libraries

● Bolt● OpenCV● ArrayFire

Images taken from: http://wordlesstech.com/2012/10/12/leatherman-oht-multi-tool/

Page 14: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Bolt

● GPU library which resembles C++ STL○ STL like data structures○ Iterators○ Fully interoperable with OpenCL

● Parallel vector operation methods○ Reductions○ Sorting○ Prefix-Sum

● Customizable GPU kernels using functors● Some functions only supported on AMD GPUs

Page 15: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Bolt - Data Structures

● Built around the device_vector● Supports the same data types as C++

○ device_vector<float> data(2e6);

● Useful when performing multiple operations on a vector

● Can be passed into STL algorithms○ Always interoperability○ Data transfer will be costly

Page 16: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Bolt - Algorithms

● Uses a C++ STL like interface○ Pass the begin and end iterators

● Accept functors which allow you to run custom operations on OpenCL devices

● Multiple backends○ OpenCL, C++AMP, and TBB○ Not all algorithms implemented across all backends

● Works on vector and device_vector

Page 17: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

OpenCV

● Open source computer vision library● C++ interface with many language wrappers● Hundreds of CV functions

Page 18: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

OpenCV ArrayFire Interop

● Helper Functions○ https://github.com/arrayfire-community/arrayfire_opencv.git

Mat R; Rodrigues(poses(Rect(0, 0, 1, 3)), R);af::array af_R = mat_to_array(R);

Page 19: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

ArrayFire - Data Structures

● Built around a flexible data structure named "array"○ Lightweight wrapper around the data on the compute device

○ Manages the data and basic metadata such as size, type and dimensions

● You can transfer data into an array using constructors● Column major

float hA[6] = {0, 1, 2, 3, 4, 5};array A(2, 3, hA);

Page 20: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

ArrayFire - Indexing#include <arrayfire.h>

#include <af/utils.h>

void af_example()

{

float f[8] = {1, 2, 4, 8, 16, 32, 64, 128};

array a(2, 4, f); // 2 rows x 4 col array initialized with f values

array sumSecondCol = sum(a(span, 1)); // reduce-sum over the second column

print(sumSecondCol); // 12

}

Page 21: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Using ArrayFire:

array tmp = img(span,span,0); // save the R channel

img(span,span,0) = img(span,span,2); // R channel gets values of B

img(span,span,2) = tmp; // B channel gets value of R

Can also do it this way:

array swapped = join(2, img(span,span,2), // blue

img(span,span,1), // green

img(span,span,0)); // red

Or simply:

array swapped = img(span,span,seq(2,-1,0));

ArrayFire Example - swap R and B

Page 22: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Using ArrayFire:array img = loadimage("image.jpg", false); // load grayscale image from disk to

device

array img_T = img.T(); // transpose

ArrayFire Functions

Page 23: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Original

Page 24: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Grayscale

Page 25: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Box filter blur

Page 26: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Gaussian blur

Page 27: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Image Negative

Page 28: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

ArrayFire // erode an image, 8-neighbor connectivity

array mask8 = constant(1,3, 3);

array img_out = erode(img_in, mask8);

// erode an image, 4-neighbor connectivity

const float h_mask4[] = { 0.0, 1.0, 0.0,

1.0, 1.0, 1.0,

0.0, 1.0, 0.0 };

array mask4 = array(3, 3, h_mask4);

array img_out = erode(img_in, mask4);

Erosion

Page 29: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Erosion

Page 30: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

ArrayFire

array R = convolve(img, ker); // 1, 2 and 3d convolution filter

array R = convolve(fcol, frow, img); // Separable convolution

array R = filter(img, ker); // 2d correlation filter

Filtering

Page 31: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Histograms

ArrayFireint nbins = 256;

array hist = histogram(img,nbins);

Page 32: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Transforms

ArrayFirearray half = resize(0.5, img);

array rot90 = rotate(img, af::Pi/2);

array warped = approx2(img, xLocations, yLocations);

Page 33: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Image smoothing

ArrayFire

array S = bilateral(I, sigma_r, sigma_c);

array M = meanshift(I, sigma_r, sigma_c, iter);

array R = medfilt(img, 3, 3);

// Gaussian blur

array gker = gaussiankernel(ncols, ncols);

array res = convolve(img, gker);

Page 34: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

FFT

ArrayFire

array R1 = fft2(I); // 2d fft. check fft, fft3

array R2 = fft2(I, M, N); // fft2 with padding

array R3 = ifft2(fft2(I, M, N) * fft2(K, M, N)); // convolve using fft2

Page 35: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

ArrayFire Capabilities

● Hundreds of parallel functions for multi-disciplinary work○ Image processing○ Machine learning○ Graphics○ Sets

● Support for multiple languages○ C/C++, Fortran, Java and R

● Linux, Windows, Mac OS X

Page 36: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

ArrayFire Capabilities

● OpenGL based graphics● JIT

○ Combine multiple operations into one kernel

● GFOR - data parallel loop○ Allows concurrent execution over multiple data sets (for example

images)

Page 37: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

ArrayFire Functions

● Supports hundreds of parallel functions○ Building blocks

■ Reductions■ Scan■ Set operations■ Sorting■ Statistics■ Basic matrix manipulation

Images taken from: http://technogems.blogspot.com/2011/06/sorting-included-files-by-importance.htmlhttp://www.cmsoft.com.br/tutorialOpenCL/CLMatrixMultExplanationSubMatrixes.png

Page 38: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

ArrayFire Functions

● Hundreds of highly-optimized parallel functions○ Signal/image processing

■ Convolution■ FFT■ Histograms■ Interpolation■ Connected components

○ Linear Algebra■ Matrix multiply■ Linear system solving■ Factorization

Page 39: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

GFOR: What is it?

• Data-Parallel for loop, e.g.

for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B;

gfor (array i, 3) C(span,span,i) = A(span,span,i) * B;

Serial matrix-vector multiplications (3 kernel launches)

Parallel matrix-vector multiplications (1 kernel launch)

Page 40: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Example: Matrix Multiply

• Data-Parallel for loop, e.g.

*

BA(,,1)

iteration i = 1

C(,,1)

=

for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B;

Serial matrix-vector multiplications (3 kernel launches)

Page 41: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Example: Matrix Multiply

• Data-Parallel for loop, e.g.

for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B;

*

BA(,,1)

iteration i = 1

C(,,1)

= *

BA(,,2)

iteration i = 2

C(,,2)

=

Serial matrix-vector multiplications (3 kernel launches)

Page 42: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Example: Matrix Multiply

• Data-Parallel for loop, e.g.

for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B;

*

BA(,,1)

iteration i = 1

C(,,1)

= *

BA(,,2)

iteration i = 2

C(,,2)

= *

BA(,,3)

iteration i = 3

C(,,3)

=

Serial matrix-vector multiplications (3 kernel launches)

Page 43: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Example: Matrix Multiply

gfor (array i, 3) C(span,span,i) = A(span,span,i) * B;

Parallel matrix multiplications (1 kernel launch)

simultaneous iterations i = 1:3

*

BA(,,1)C(,,1)

= *

BA(,,2)C(,,2)

= *

BA(,,3)C(,,3)

=

Page 44: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Example: Matrix Multiply

simultaneous iterations i = 1:3

BA(,,1:3)C(,,1:3)

*=*=

*=

Think of GFOR as compiling 1 stacked kernel with all iterations.

gfor (array i, 3) C(span,span,i) = A(span,span,i) * B;

Parallel matrix multiplications (1 kernel launch)

Page 45: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

JIT Code Generation

● Run time kernel generation● Combines multiple element wise operations into one

kernel● Reduces kernel launching overhead● Intermediate data not allocated● Improves cache performance

Page 46: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Success Stories

Field Application Speedup

Academia Power Systems Simulations 35x

Finance Option Pricing 52x

Government Radar Image Formation 45x

Life Sciences Pathology Advances > 100x

Manufacturing Tomography of Vegetation 10x

Media & Computer Vision Digital Holography 17x

Oil & Gas Ground Water Simulations > 20x

Page 47: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Future capabilities

● We are interested in Big Data applications● Create capabilities for

○ Streaming video○ Large number of images○ Machine learning○ Data analysis○ Dynamic data

● Faster rendering utilities for Big Data

Page 48: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Comments on Open Source

● https://github.com/arrayfire-community

Page 50: Productive OpenCL Programming An Introduction to OpenCL Libraries  with ArrayFire COO Oded Green

Look us up

www.ArrayFire.com

For language wrappers and exampleshttps://github.com/ArrayFire