openmp 4.0 & beyond - wordpress.comopenmp 4.0 & beyond christian terboven | it center der...

40
IT Center der RWTH Aachen University OpenMP 4.0 & Beyond Christian Terboven [email protected] IT Center, RWTH Aachen University Member of the OpenMP Language Committee (since 2006)

Upload: others

Post on 18-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

IT Center der RWTH Aachen University

OpenMP 4.0 & Beyond

Christian Terboven

[email protected]

IT Center, RWTH Aachen University

Member of the OpenMP Language Committee (since 2006)

Page 2: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University2

HPC at RWTH Aachen University

Tier 0 – PRACE

Tier 1 – Gauss Center

Tier 1.5 – JARA-HPC

Tier 2 – Gauss Alliance

Page 3: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University3

IT Center: CSE Department

The IT Center is a central institution of RWTH Aachen University,

supporting all major processes at the university,

providing basic and individually tailored IT services for all univ. institutions,

supports the Simulation Sciences as an important pillar of the RWTH strategy.

CSE Department:

Chair f. High Perf. Computing

Chair f. Immersive Viz.

Page 4: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University4

New Features in OpenMP 4.0

Page 5: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University5

OpenMP 4.0 brought many new features

Device constructs: support for compute acceleration devices and

heterogeneous computing

Page 6: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University6

Data environment is lexically scoped

Data environment is destroyed at closing curly brace

Allocated buffers/data are automatically released

Use target construct to

Transfer control from the host to the device

Establish a data environment (if not yet done)

Host thread waits until offloaded region completed

Execution + Data Model

Host Device

#pragma omp target \

alloc(…)

1

from(…)

4

to(…)

2

pA

map(alloc:...) \

map(to:...) \

{ ... }

3

map(from:...)

Page 7: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University7

Device constructs

This is a very simple example, but you see functionality similar to

OpenACC 1.0. Device support has to come with the implementation.

#pragma omp target data device(0) map(alloc:tmp[:N])

map(to:input[:N)) map(from:res)

{

#pragma omp target device(0)

#pragma omp parallel for

for (i=0; i<N; i++)

tmp[i] = some_computation(input[i], i);

do_some_other_stuff_on_host();

#pragma omp target device(0)

#pragma omp parallel for reduction(+:res)

for (i=0; i<N; i++)

res += final_computation(tmp[i], i)

}

host

targ

et

host

targ

et

host

offload

data region

shaping andslicing

Page 8: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University8

SAXPY: Serial (Host)

int main(int argc, const char* argv[]) {

int n = 10240; float a = 2.0f; float b = 3.0f;

float *x = (float*) malloc(n * sizeof(float));

float *y = (float*) malloc(n * sizeof(float));

// Initialize x, y

// Run SAXPY TWICE

for (int i = 0; i < n; ++i){

y[i] = a*x[i] + y[i];

}

// y is needed and modified on the host here

for (int i = 0; i < n; ++i){

y[i] = b*x[i] + y[i];

}

free(x); free(y); return 0;

}

Page 9: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University9

SAXPY: OpenACC (NVIDIA GPGPU)

int main(int argc, const char* argv[]) {

int n = 10240; float a = 2.0f; float b = 3.0f;

float *x = (float*) malloc(n * sizeof(float));

float *y = (float*) malloc(n * sizeof(float));

// Initialize x, y

// Run SAXPY TWICE

#pragma acc data copyin(x[0:n])

{

#pragma acc parallel copy(y[0:n])

#pragma acc loop

for (int i = 0; i < n; ++i){

y[i] = a*x[i] + y[i];

}

// y is needed and modified on the host here

#pragma acc parallel copy(y[0:n])

#pragma acc loop

for (int i = 0; i < n; ++i){

y[i] = b*x[i] + y[i];

}

}

free(x); free(y); return 0;

}

Page 10: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University10

SAXPY: OpenMP 4.0 (NVIDIA GPGPU)

int main(int argc, const char* argv[]) {

int n = 10240; float a = 2.0f; float b = 3.0f;

float *x = (float*) malloc(n * sizeof(float));

float *y = (float*) malloc(n * sizeof(float));

// Initialize x, y

// Run SAXPY TWICE

#pragma omp target data map(to:x[0:n])

{

#pragma omp target map(tofrom:y[0:n])

#pragma omp teams

#pragma omp distribute

#pragma omp parallel for

for (int i = 0; i < n; ++i){

y[i] = a*x[i] + y[i];

}

// y is needed and modified on the host here

#pragma omp target map(tofrom:y[0:n])

#pragma omp teams

#pragma omp distribute

#pragma omp parallel for

for (int i = 0; i < n; ++i){

y[i] = b*x[i] + y[i];

}

}

free(x); free(y); return 0;

}

Page 11: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University11

SAXPY: OpenMP 4.0 (Intel MIC)

int main(int argc, const char* argv[]) {

int n = 10240; float a = 2.0f; float b = 3.0f;

float *x = (float*) malloc(n * sizeof(float));

float *y = (float*) malloc(n * sizeof(float));

// Initialize x, y

// Run SAXPY TWICE

#pragma omp target data map(to:x[0:n])

{

#pragma omp target map(tofrom:y[0:n])

#pragma omp parallel for

for (int i = 0; i < n; ++i){

y[i] = a*x[i] + y[i];

}

// y is needed and modified on the host here

#pragma omp target map(tofrom:y[0:n])

#pragma omp parallel for

for (int i = 0; i < n; ++i){

y[i] = b*x[i] + y[i];

}

}

free(x); free(y); return 0;

}

Page 12: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University12

Constructs and Clauses: *MP vs. *ACC

OpenMP OpenACC Remark

target parallel Offload of computational work to the device (sync.)

teams, parallel parallel Creation of in parallel running threads

- Kernel Automatic parallelization by the compiler

target data data Structured data management between host & device

distribute, do, for, simd

loop Worksharing across parallel units

- host data Interoperability, i.e. with CUDA

- cache Move objects closer in memory to execution units

target update update Data movement between host & device within data env.

declare target declare Declaration of global/static/extern objects

- enter data Unstructured data movement to the device

- exit data Unstructured data movement from the device

Page 13: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University13

Constructs and Clauses: *MP vs. *ACC

OpenMP OpenACC Remark

tasks - Creation of explicit tasks for Task Parallelism

task depend async(int) Asynchronous execution with dependencies

- wait Synchronization of Streams

- async wait Asynchronous Waiting on a specific Stream

parallel in parallel orteam

parallel in parallel

Nested parallelism

- tile Strip-mining of data collections

- device_type Device-specific tuning of clauses

atomic atomic Atomic operations

sections, critical, …

Non-iterative worksharing, critical sections, synchroni-zation, control flow of threads, …

OpenACC is best for GPU-programming right now. OpenMP is much broader

(as we will see) and support more device types, but currently no GPUs in any

commercial implementation.

Paper at Euro-Par 2014: S. Wienke, C. Terboven, J. C. Beyerand M. S. Müller:

A Pattern-based Comparison of OpenACC and OpenMP forAccelerator Computing

Page 14: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University14

OpenMP 4.0 brought many new features

Device constructs: support for compute acceleration devices and

heterogeneous computing

SIMD constructs: portable description of SIMD expression and

their combination with parallelization directives

Page 15: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University15

SIMD me

If you know about vectorization, you will recognize what is going on

here. If not, you probably leave a lot of performance on the table…

#pragma omp declare simd aligned(a, b) notinbranch

float min(float a, float b) {

return a < b ? a : b;

}

void dist_update(float *a, float *b, float *y, int vlen) {

float *ptr = b;

#pragma omp parallel for simd safelen(16) linear(ptr:1) \

aligned(a,b,y)

for (int i = 0; i < vlen; i++) {

y[i] = min(sqrt(distance(a[i], 1.0)), ptr);

ptr += 1;

}

}

Page 16: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University16

OpenMP 4.0 brought many new features

Device constructs: support for compute acceleration devices and

heterogeneous computing

SIMD constructs: portable description of SIMD expression and

their combination with parallelization directives

Cancellation: directives to request the termination of the

current OpenMP region

Page 17: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University17

Cancellation Constructs

Two constructs:

Activate cancellation:

C/C++: #pragma omp cancel

Fortran: !$omp cancel

Check for cancellation:

C/C++: #pragma omp cancellation point

Fortran: !$omp cancellation point

Check for cancellation only a certain points

Avoid unnecessary overheads

Programmers need to reason about cancellation

Cleanup code needs to be added manually

Page 18: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University18

Stopping midstream

This feature enables new classes of (irregular) problems to be

exploited with OpenMP (tasks).

binary_tree_t* search_tree(binary_tree_t* tree, int val) {

if (tree->value == val)

found = tree;

else {

#pragma omp task shared(found)

{

binary_tree_t* found_left;

found_left = search_tree(tree->left, val);

if (found_left) {

#pragma omp atomic write

found = found_left;

#pragma omp cancel taskgroup

}

} // end omp task, followed by similar code for “right” side

Page 19: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University19

Cancellation Semantics

Thread A Thread B Thread C

para

llel re

gio

n

Page 20: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University20

Cancellation Semantics

Thread A Thread B Thread C

para

llel re

gio

n

Page 21: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University21

Cancellation Semantics

Thread A Thread B Thread C

para

llel re

gio

n

Page 22: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University22

Cancellation Semantics

Thread A Thread B Thread C

para

llel re

gio

n

Page 23: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University23

OpenMP 4.0 brought many new features

Device constructs: support for compute acceleration devices and

heterogeneous computing

SIMD constructs: portable description of SIMD expression and

their combination with parallelization directives

Cancellation: directives to request the termination of the

current OpenMP region

Task Dependencies: clauses to express additional constraints

on the scheduling of tasks w.r.t. their siblings

Page 24: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University24

Talk the talk, task the Task

Note: variables in the depend clause do not necessarily have to

indicate the data flow

void process_in_parallel() {

#pragma omp parallel

#pragma omp single

{

int x = 1;

...

for (int i = 0; i < T; ++i) {

#pragma omp task shared(x, ...) depend(out: x) // T1

preprocess_some_data(...);

#pragma omp task shared(x, ...) depend(in: x) // T2

do_something_with_data(...);

#pragma omp task shared(x, ...) depend(in: x) // T3

do_something_independent_with_data(...);

}

} // end omp single, omp parallel

}

Very simple example. But you should see the new possibilities opened up. If you do not,OmpSs has encouraging examples.

T1 has to be completedbefore T2 and T3 can beexecuted.

T2 and T3 can beexecuted in parallel.

Degree of parallism exploitable in this concrete example:T2 and T3 (2 tasks), T1 of next iteration has to wait for them

Page 25: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University25

Concurrent Execution w/ Dependencies

The following allows for more parallelism, as there is one i per

thread. Hence, two tasks may be active per thread.

void process_in_parallel() {

#pragma omp parallel

{

#pragma omp for

for (int i = 0; i < T; ++i) {

#pragma omp task depend(out: i)

preprocess_some_data(...);

#pragma omp task depend(in: i)

do_something_with_data(...);

#pragma omp task depend(in: i)

do_something_independent_with_data(...);

}

} // end omp parallel

}

Page 26: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University26

Concurrent Execution w/ Dependencies

The following allows for even more parallelism, as there now can be

two tasks active per thread per i-th iteration.

void process_in_parallel() {

#pragma omp parallel

#pragma omp single

{

for (int i = 0; i < T; ++i) {

#pragma omp task firstprivate(i)

{

#pragma omp task depend(out: i)

preprocess_some_data(...);

#pragma omp task depend(in: i)

do_something_with_data(...);

#pragma omp task depend(in: i)

do_something_independent_with_data(...);

} // end omp task

}

} // end omp single, end omp parallel

}

Page 27: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University27

„Real“ Task Dependencies

void blocked_cholesky( int NB, float A[NB][NB] ) {

int i, j, k;

for (k=0; k<NB; k++) {

#pragma omp task depend(inout:A[k][k])

spotrf (A[k][k]) ;

for (i=k+1; i<NT; i++)

#pragma omp task depend(in:A[k][k]) depend(inout:A[k][i])

strsm (A[k][k], A[k][i]);

// update trailing submatrix

for (i=k+1; i<NT; i++) {

for (j=k+1; j<i; j++)

#pragma omp task depend(in:A[k][i],A[k][j])

depend(inout:A[j][i])

sgemm( A[k][i], A[k][j], A[j][i]);

#pragma omp task depend(in:A[k][i]) depend(inout:A[i][i])

ssyrk (A[k][i], A[i][i]);

}

}

}

* image from BSC

Page 28: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University28

OpenMP 4.0 brought many new features

Device constructs: support for compute acceleration devices and

heterogeneous computing

SIMD constructs: portable description of SIMD expression and

their combination with parallelization directives

Cancellation: directives to request the termination of the

current OpenMP region

Task Dependencies: clauses to express additional constraints

on the scheduling of tasks w.r.t. their siblings

Thread Affinity: both coarse- and fine-grained control of where

OpenMP threads are executed

Page 29: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University29

OpenMP 4.0: Places + Binding Policies

Define OpenMP Places

set of OpenMP threads running on one or more processors

can be defined by the user, i.e. OMP_PLACES=cores

Define a set of OpenMP Thread Affinity Policies

SPREAD: spread OpenMP threads evenly among the places,

partition the place list

CLOSE: pack OpenMP threads near master thread

MASTER: collocate OpenMP thread with master thread

Goals

user has a way to specify where to execute OpenMP threads for

locality between OpenMP threads / less false sharing / memory bandwidth

Page 30: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University30

OpenMP Places

Assume the following machine:

2 sockets, 4 cores per socket, 4 hyper-threads per core

Abstract names for OMP_PLACES:

threads: Each place corresponds to a single hardware thread on the target

machine.

cores: Each place corresponds to a single core (having one or more hardware

threads) on the target machine.

sockets: Each place corresponds to a single socket (consisting of one or more

cores) on the target machine.

p0 p1 p2 p3 p4 p5 p6 p7

Page 31: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University31

Get close to me

Abstract names combined with the strategies allow for ease of use.

However, if you do need full control, you can get it.

OMP_PLACES = cores

#pragma omp parallel proc_bind(spread) num_threads(4)

#pragma omp parallel proc_bind(close) num_threads(4)

initial

spread 4

close 4

OMP_PLACES=(0,1,2,3), (4,5,6,7), ... = (0-3):8:4 cores

p0 p1 p2 p3 p4 p5 p6 p7

p0 p1 p2 p3 p4 p5 p6 p7

p0 p1 p2 p3 p4 p5 p6 p7

Page 32: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University32

OpenMP 4.0 brought many new features

Device constructs: support for compute acceleration devices and

heterogeneous computing

SIMD constructs: portable description of SIMD expression and

their combination with parallelization directives

Cancellation: directives to request the termination of the

current OpenMP region

Task Dependencies: clauses to express additional constraints

on the scheduling of tasks w.r.t. their siblings

Thread Affinity: both coarse- and fine-grained control of where

OpenMP threads are executed

User-defined reductions: finally (!) support for more than * and +

operators in the reduction clause

Page 33: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University33

Reducers everywhere

This was missing for C++ codes.

#pragma omp declare reduction

(merge: std::vector<int> :

omp_out.insert(omp_out.end(), omp_in.begin(),

omp_in.end() ) )

void schedule (std::vector<int> &v, std::vector<int> &filtered)

{

#pragma omp parallel for reduction(merge: filtered)

for (std:vector<int>::iterator it = v.begin();

it < v.end(); it++)

if ( filter(*it) )

filtered.push_back(*it);

}

Page 34: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University34

OpenMP 4.0 brought many new features

Device constructs: support for compute acceleration devices and

heterogeneous computing

SIMD constructs: portable description of SIMD expression and

their combination with parallelization directives

Cancellation: directives to request the termination of the

current OpenMP region

Task Dependencies: clauses to express additional constraints

on the scheduling of tasks w.r.t. their siblings

Thread Affinity: both coarse- and fine-grained control of where

OpenMP threads are executed

User-defined reductions: finally (!) support for more than * and +

operators in the reduction clause

Sequentially consistent atomics: bless you!

Page 35: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University35

OpenMP 4.0 brought many new features

Device constructs: support for compute acceleration devices and

heterogeneous computing

SIMD constructs: portable description of SIMD expression and

their combination with parallelization directives

Cancellation: directives to request the termination of the

current OpenMP region

Task Dependencies: clauses to express additional constraints

on the scheduling of tasks w.r.t. their siblings

Thread Affinity: both coarse- and fine-grained control of where

OpenMP threads are executed

User-defined reductions: finally (!) support for more than * and +

operators in the reduction clause

Sequentially consistent atomics: bless you!

… and even more (minor) things, i.e. improved Fortran 2003 support!

Page 36: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University36

My Thoughts on Future

Directions

Page 37: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University37

… but, honestly, what is OpenMP now?

The OpenMP mission is to standardize directive-based multi-

language high-level parallelism that is performant, productive and

portable.

Page 38: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University38

Future Directions of OpenMP

Already released: OpenMP Tools Interface Technical Report

The OpenMP Language Committee is active

(multiple) weekly conference calls

(at least) three face-to-face meetings per year

OpenMP 4.1 targeted for release at SC15

Aims for OpenMP 4.1

Many improvements to the accelerator support, i.e. unstructured data movmt.

Initial Support for Memory Affinity

Interoperability with Posix-Threads

Reductions for Tasks

Some new features, i.e. DOACROSS loops, maybe unrolling and blocking

Page 39: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University39

Where ever you go, OpenMP will be there

Page 40: OpenMP 4.0 & Beyond - WordPress.comOpenMP 4.0 & Beyond Christian Terboven | IT Center der RWTH Aachen University 3 IT Center: CSE Department The IT Center is a central institution

OpenMP 4.0 & Beyond

Christian Terboven | IT Center der RWTH Aachen University40

Thank you for your attention.

Some examples are taken from the Advanced OpenMP

Tutorial regularly given at SC and ISC conferences together

with Bronis R. de Supinski, Michael Klemm, Eric Stotzer and

Ruud van der Pas.

The End