evolution of opencl *

51
Introduction to OpenCL* Ohad Shacham Intel Software and Services Group Thanks to Elior Malul, Arik Narkis, and Doron Singer 1

Upload: berit

Post on 24-Feb-2016

82 views

Category:

Documents


0 download

DESCRIPTION

Introduction to OpenCL * Ohad Shacham Intel Software and Services Group Thanks to Elior Malul, Arik Narkis, and Doron Singer . Evolution of OpenCL *. Sequential Programs. int main() { //read input scalar_mul (…) return 0; }. void scalar_mul ( int n, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Evolution of  OpenCL *

Introduction to OpenCL*

Ohad Shacham

Intel Software and Services Group

Thanks to Elior Malul, Arik Narkis, and Doron Singer 1

Page 2: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Evolution of OpenCL*

2

Sequential Programs

void scalar_mul(int n, const float *a, const float *b, float *c){ int i; for (i = 0; i < n; i++) c[i] = a[i] * b[i];}

int main(){ //read input scalar_mul(…) return 0;}

Page 3: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Evolution of OpenCL*

Multi-threaded Programs

void scalar_mul(int n, const float *a,

const float *b, float *c){ int i; for (i = 0; i < n; i++) c[i] = a[i] * b[i];}

int main(){ //read input pthread_start(…, scalar_mul); scalar_mul(n/2, …); pthread_join(…); return 0;}

Page 4: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Problems – concurrent programs

• Writing concurrent programs is hard

• Concurrent algorithms

• Threads

• Work balancing• Need to update programs when adding new cores to the system

• Dataraces, livelocks, deadlocks• Solving bugs in concurrent programs is harder

4

Page 5: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Evolution of OpenCL*

5

Vector instruction utilization

void scalar_mul(int n, const float *a, const float *b, float *c){ int i; for (i = 0; i < n; i+=4){ __m128 a_vec = _mm_load_ps(a+i); __m128 b_vec = _mm_load_ps(b+i); __m128 c_vec = _mm_mul_ps(a_vec, b_vec); __mm_store_ps(c + i, c_vec); }}

int main(){ //read input scalar_mul(…) return 0;}

Page 6: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Problems – vector instructions usage

• Utilizing vector instructions in also not a trivial task

• Vendor dependent code

• Usage is not future proof• New efficient instruction• Wider vector registers

6

Page 7: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

7

GPGPUGPGPU stands for General-Purpose computation on Graphics Processing Units (GPUs). GPUs are high-performance many-core processors that can be used to accelerate a wide range of applications

(www.gpgpu.org)

Photo taken from: http://folding.stanford.edu/English/FAQ-NVIDIA

Page 8: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

GPUs utilization

• Many cores can be utilized for computation

• GPUs become programmable - GPGPU• CUDA*

• Problems• Each vendor has its own language• Requires tweaking to get performance• How can I run both on CPUs and GPUs?

8

Page 9: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

What do we need?

• Heterogeneous• Automatically utilizes all available processing units• Portable

• High Performance• Utilize Hardware characteristics

• Future Proof

• Abstract concurrency from the user

9

Page 10: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

OpenCL* – heterogeneous computing

10

Diagram based on deck presented in OpenCL* BOF at SIGGRAPH 2010 by Neil Trevett, NVIDIA, OpenCL* Chair

Page 11: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

OpenCL* in a nutshell

An OpenCL* application consists two parts:

• A set of APIs in C that allows compiling and running OpenCL* “Kernels”

• A code that is executed on the device by the OpenCL* runtime

11

Page 12: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Data parallelism

12

A fundamental pattern in high-performance parallel algorithms

Applying same computation logic across multiple data elements

C[i] = A[i] * B[i]

i = 0

i = i + 1

C[i] = A[i] * B[i]

C[i] = A[i] * B[i]

C[i] = A[i] * B[i]

C[i] = A[i] * B[i]

C[i] = A[i] * B[i]

C[i] = A[i] * B[i]

i = 0

i = 1

i = 2

i = 3

i = N-2

i = N-1

Page 13: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

13

Data parallelism UsageClient machines• Video transcoding and editing• Pro image editing• Facial recognition

Workstations• CAD tools• 3D data content creation

Servers• Science and simulations• Medical imaging• Oil & Gas• Finance (e.g., Black-Scholes)• …

Page 14: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

14

OpenCL* kernel example

void array_mul(int n, const float *a, const float *b, float *c){ int i; for (i = 0; i < n; i++) c[i] = a[i] * b[i];}

__kernelvoid array_mul( __global const float *a, __global const float *b, __global float *c){ int id = get_global_id(0); c[id] = a[id] * b[id];}

Page 15: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

15

OpenCL* kernel example __kernelvoid array_mul(__global const float *a, __global const float *b, __global float *c){ int id = get_global_id(0); c[id] = a[id] * b[id];}

a

b

c

get_global_id(0)

Page 16: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

16

Execution Model

Work GroupWork GroupWork Group Work Group

Work Item

Global

Page 17: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

The OpenCL* model• OpenCL* runtime is invoked on Host CPU (using OpenCL* API)

– Choose target device/s for parallel computation

• Data-parallel functions, called Kernels, are compiled (on host)

• Compiled for specific target devices (CPU, GPU, etc..)

• Data chunks (called Buffers) are moved across devices

• Kernel “commands” queued for execution on target devices– Asynchronous execution

Page 18: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

18

The OpenCL* - C language• Derived from ISO C99

• Few restrictions e.g., recursion, function pointers

• Short vector types e.g., float4, short2, int16

• Built-in functions – math (e.g., sin), geometric, common (e.g., min, clamp)

Page 19: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Unified programming model for all devices• Develop once, run everywhere

Designed for massive data-parallelism• Implicitly takes care of threading and intrinsics for optimal

performance

19

OpenCL* key features

Page 20: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Dynamic compilation model (Just In Time - JIT) • Future proof, provided vendors update their implementations

Enables heterogeneous computing• A clever application can use all resources of the platform

simultaneously

20

OpenCL* key features

Page 21: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Benefits to User

• Hardware abstraction• write once, run everywhere• Cross devices, cross vendors

• Automatic parallelization

• Good tradeoff between development simplicity and performance

• Future proof optimizations

• Open standard• Supported by many vendors

21

Page 22: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Benefits to Hardware Vendor

• Enables good hardware ‘time to market’

• Programming model enables good hardware utilization

• Applications are automatically portable and future proof– JIT compilation

22

Page 23: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

OpenCL* Cons

• Low level – based on C99 • No heap!• Lean framework

• Expert tool• In term of correctness and performance

• OpenCL* is not performance portable• Tweaking is needed for each vendor• Future specs and implementations may require no tweaking?

23

Page 24: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Vector dot multiplication

24

void vectorDotMul(int* vecA, int* vecB, int size, int* result){ *result = 0; for (int i=0; i < size; ++i) *result += vecA[i] * vecB[i];}

Page 25: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

25

111111

222222

11

22

Single work item

* = 2* = 24* = 26* = 28* = 210* = 212* = 21214* = 216

Page 26: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Vector dot multiplication in OpenCL*

26

__kernel void vectorDotMul(int* vecA, int* vecB, int size, int* result) { if (get_global_id(0) == 0){ *result = 0; for (int i=0; i<size; ++i) *result += vecA[i] * vecB[i]; }}

Page 27: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

27

11

11

11

22

22

22

11

22

Single work group

* = 2* = 24

* = 2

* = 2

* = 2* = 2

* = 2* = 2

4

4

4

8

12

16

Page 28: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

28

__kernel void vectorDotMul(int* vecA, int* vecB, int size, int* result){ int id = get_local_id(0); __local volatile int partialSum[MAX_SIZE]; int localSize = get_local_size(0); int work = size/localSize; int start = id*work; int end = start+work; for (int j=start; j<end; ++j) partialSum[id] += vecA[j] * vecB[j]; barrier(CLK_LOCAL_MEM_FENCE); if (id == 0) *result = 0; for (int i=0; i<localSize; ++i) *result += partialSum[i];}

Work item calculation

Reduction

Page 29: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

29

11

11

11

22

22

22

11

22

Efficient reduction

* = 2* = 24

* = 2

* = 2

* = 2* = 2

* = 2* = 2

4

4

4

8

4

816

Page 30: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Vectorization

• Processors provide vector units• SIMD on CPUs• Warp on GPUs

• Utilize to perform few operations in parallel– Arithmetic operations– Binary operations – Memory operation

30

Page 31: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Loop vectorization

31

void mul(int size, int* a, int* b, int* c) { for (int i=0; i < size; ++i) { c[i] = a[i] * b[i]; }}

Page 32: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Loop vectorization

32

void mul(int size, int* a, int* b, int* c) { for (int i=0; i < size; i += 4) { c[i] = a[i] * b[i]; c[i+1] = a[i+1] * b[i+1]; c[i+2] = a[i+2] * b[i+2]; c[i+3] = a[i+3] * b[i+3]; }}

Page 33: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Loop vectorization

33

void mul(int size, int* a, int* b, int* c) { for (int i=0; i < size; i += 4) { __m128 a_vec = _mm_load_ps(a + i); __m128 b_vec = _mm_load_ps(b + i); __m128 c_vec = _mm_mul_ps(a_vec, b_vec); __mm_store_ps(c + i, c_vec); }}

Page 34: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Automatic loop vectorization

34

Is there dependency between a, b, and c?

void mul(int size, int* a, int* b, int* c) { for (int i=0; i < size; ++i) { c[i] = a[i] * b[i]; }}

Page 35: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Automatic loop vectorization

35

cb

void mul(int size, int* a, int* b, int* c) { for (int i=0; i < size; ++i) { c[i] = a[i] * b[i]; }}

Page 36: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Automatic loop vectorization

36

cb

void mul(int size, int* a, int* b, int* c) { for (int i=0; i < size; i += 4) { c[i] = a[i] * b[i]; c[i+1] = a[i+1] * b[i+1]; c[i+2] = a[i+2] * b[i+2]; c[i+3] = a[i+3] * b[i+3]; }}

Page 37: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Automatic vectorization in OpenCL*

37

__kernel void mul(int size, int* a, int* b, int* c) { int id = get_global_id(0); c[id] = a[id] * b[id];}

Page 38: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Automatic vectorization in OpenCL*

38

for (int id=workGroupIdStart; id < workGroupIdEnd; ++id) { c[id] = a[id] * b[id];}

Page 39: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Automatic vectorization in OpenCL*

39

for (int id=workGroupIdStart; id < workGroupIdEnd; id +=4) { c[id] = a[id] * b[id]; c[id+1] = a[id+1] * b[id+1]; c[id+2] = a[id+2] * b[id+2]; c[id+3] = a[id+3] * b[id+3];}

Page 40: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Automatic vectorization in OpenCL*

40

for (int id=workGroupIdStart; id < workGroupIdEnd; id +=4) { __m128 a_vec = _mm_load_ps(a + id); __m128 b_vec = _mm_load_ps(b + id); __m128 c_vec = _mm_mul_ps(a_vec, b_vec); __mm_store_ps(c + id, c_vec);}

Page 41: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

41

11

11

11

22

22

22

11

22

Single work group

* = 2* = 24

* = 2

* = 2

* = 2* = 2

* = 2* = 2

4

4

4

8

4

816

Page 42: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

42

1

1

1

1

1

1

2

2

2

2

2

2

1

1

2

2

Vectorizer friendly

* = 2

* = 24

* = 2

* = 2

* = 2

* = 2

* = 2

* = 2

444

84816

Page 43: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

43

__kernel void vectorDotMul(int* vecA, int* vecB, int size, int* result){ int id = get_local_id(0); __local volatile int partialSum[MAX_SIZE]; int localSize = get_local_size(0); int work = size/localSize;

for (int j=start; j < cols; j + = size) partialSum[id] += vecA[j] * vecB[j];

barrier(CLK_LOCAL_MEM_FENCE); if (id == 0) *result = 0; for (int i=0; i<localSize; ++i) *result += partialSum[i];}

Work item calculation

Reduction

Page 44: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Predication

44

__kernel void mul(int size, int* a, int* b, int* c) { int id = get_global_id(0); if(id > 6) { c[id] = a[id] * b[id]; } else { c[id] = a[id] + b[id]; }}

Page 45: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Predication

45

for (int id=workGroupIdStart; id < workGroupIdEnd; id +=4) { if(id > 6) { c[id] = a[id] * b[id]; } else { c[id] = a[id] + b[id]; }}

How can we vectorize the loop?

Page 46: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Predication

46

for (int id=workGroupIdStart; id < workGroupIdEnd; id +=4) { bool mask = (id > 6); int c1 = a[id] * b[id]; int c2 = a[id] + b[id];

c[id] = (mask) ? c1 : c2;}

Page 47: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Predication

47

for (int id=workGroupIdStart; id < workGroupIdEnd; id +=4) { __m128 idVec = // vector of consecutive ids __m128 mask = _mm_cmpgt_epi32(idVec, Vec6); __m128 a_vec = _mm_load_ps(a + id); __m128 b_vec = _mm_load_ps(b + id);

__m128 c1_vec = _mm_mul_ps(a_vec, b_vec); __m128 c2_vec = _mm_add_ps(a_vec, b_vec); __m128 c3_vec = _mm_blendv_ps(c1_vec, c2_vec, mask);

__mm_store_ps(c + id, c3_vec);}

Page 48: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

General tweaking

• Consecutive memory accesses• SIMD, WARP

• How can we vectorize with control flow?

• Can we somehow create an efficient code with control flow?• Uniform CF• CF diverge in SIMD size

• Enough work groups to utilize machine

48

Page 49: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Architecture tweaking

CPU• Locality• No local memory (also slow in some GPUs)• Enough compute for a work group• Overcome thread creation overhead

GPU• Use local memory• Avoid bank conflicts

49

Page 50: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Conclusion

• OpenCL* is an open standard that lets developers:– Write the same code for any type of processor

• Use all existing resources of a platform in their application

• Automatic parallelism

• OpenCL* applications are automatically portable and forward compatible

• OpenCL* is still an expert tool– OpenCL* is not performance portable– Tweaking for each vendor should be done

50

Page 51: Evolution of  OpenCL *

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.

Optimization NoticeIntel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Legal Disclaimer & Optimization Notice

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

51