programming withprogramming with openmp* · omp sectionsomp sections #pragma omp sections#pragma...

77
Programming with Programming with OpenMP* OpenMP

Upload: others

Post on 16-Mar-2020

51 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Programming withProgramming with OpenMP*OpenMP

Page 2: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

ObjectivesObjectives

Upon completion of this module you will be able to use OpenMP to:

implement data parallelismimplement data parallelismimplement task parallelism

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 3: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

AgendaAgenda

What is OpenMP?Parallel regionsWorksharingWorksharingData environment SynchronizationOptional Advanced topicsOptional Advanced topics

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 4: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

What Is OpenMP?What Is OpenMP?

Portable, shared-memory threading APIFortran C and C++– Fortran, C, and C++

– Multi-vendor support for both Linux and WindowsStandardizes task & loop-level parallelismhttp://www.openmp.orgStandardizes task & loop level parallelismSupports coarse-grained parallelismCombines serial and parallel code in

Current spec is OpenMP 3.0

318 Pages Combines serial and parallel code in single sourceStandardizes ~ 20 years of compiler

g

(combined C/C++ and Fortran)

Standardizes ~ 20 years of compiler-directed threading experience

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 5: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Programming ModelProgramming ModelFork-Join Parallelism: • Master thread spawns a team of threads as needed• Parallelism is added incrementally: that is, the sequential

program evolves into a parallel programprogram evolves into a parallel program

Master Thread

Parallel Regions

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 6: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

A Few Syntax Details to Get StartedA Few Syntax Details to Get Started

Most of the constructs in OpenMP areMost of the constructs in OpenMP are compiler directives or pragmas

For C and C++ the pragmas take the form:For C and C , the pragmas take the form:#pragma omp construct [clause [clause]…]

For Fortran, the directives take one of the forms:

C$OMP construct [clause [clause]…]!$OMP construct [clause [clause] ]!$OMP construct [clause [clause]…]*$OMP construct [clause [clause]…]

Header file or Fortran 90 moduleHeader file or Fortran 90 module#include “omp.h”use omp_lib

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 7: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

AgendaAgenda

What is OpenMP?Parallel regionsParallel regionsWorksharingData environment SynchronizationSynchronizationOptional Advanced topics

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 8: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Parallel Region & Structured Blocks (C/C++)Most OpenMP constructs apply to structured blocks

Structured block: a block with one point of entry at the top

g ( )

Structured block: a block with one point of entry at the top and one point of exit at the bottom

The only “branches” allowed are STOP statements in Fortran yand exit() in C/C++

if (go_now()) goto more;#pragma omp parallel#pragma omp parallel{int id = omp_get_thread_num();

more: res[id] = do big job(id);

{int id = omp_get_thread_num();

more: res[id] = do big job (id); more: res[id] = do_big_job(id); if (conv (res[id]) goto done;goto more;

}

more: res[id] = do_big_job (id);

if (conv (res[id]) goto more;}

A structured block Not a structured block

done: if (!really_done()) goto more;printf (“All done\n”);

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 9: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Activity 1: Hello WorldsActivity 1: Hello Worlds

Modify the “Hello, Worlds” serial code to run multithreaded using OpenMP*

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 10: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

AgendaAgenda

What is OpenMP?Parallel regionsParallel regionsWorksharing – Parallel ForData environment SynchronizationSynchronizationOptional Advanced topicsp p

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 11: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

WorksharingWorksharing

W k h i i h l d iWorksharing is the general term used in OpenMP to describe distribution of work across threads.Three examples of worksharing inThree examples of worksharing in OpenMP are:

fomp for constructomp sections constructAutomatically divides work

h domp sections constructomp task construct

among threads

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 12: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

omp for constructomp for construct// 12

#pragma omp parallel

# f

// assume N=12#pragma omp parallel#pragma omp for

#pragma omp for

i = 1 i = 5 i = 9

for(i = 1, i < N+1, i++) c[i] = a[i] + b[i];

Threads are assigned an independent set of

i = 2

i = 3

i = 4

i = 6

i = 7

i = 8

i = 10

i = 11

i = 12an independent set of iterationsThreads must wait at

Implicit barrier

Threads must wait at the end of work-sharing construct

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

sharing construct

Page 13: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Combining constructsCombining constructs

These two code segments are equivalentThese two code segments are equivalent#pragma omp parallel {

#pragma omp forfor (i=0;i< MAX; i++) {

res[i] = huge();}

}

#pragma omp parallel forfor (i=0;i< MAX; i++) {

res[i] = huge();res[i] = huge();}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 14: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

The Private ClauseThe Private Clause

Reproduces the variable for each taskReproduces the variable for each taskVariables are un-initialized; C++ object is default

t t dconstructedAny value external to the parallel region is undefinedundefinedvoid* work(float* c, int N) {float x, y; int i;, y; ;

#pragma omp parallel for private(x,y)for(i=0; i<N; i++) {

x = a[i]; y = b[i];c[i] = x + y;c[i] = x + y;

}}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 15: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Activity 2 – Parallel MandelbrotActivity 2 Parallel Mandelbrot

Objective: create a parallel version of Mandelbrot. Modify the code to add OpenMP worksharingOpenMP worksharing clauses to parallelize the computation of Mandelbrotcomputation of Mandelbrot. Follow the next Mandelbrot activity called Mandelbrot in the student lab docCopyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

the student lab doc

Page 16: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

The schedule clauseThe schedule clauseThe schedule clause affects how loop iterations are mapped onto threads

schedule(static [,chunk])Blocks of iterations of size “chunk” to threadsRound robin distributionLow overhead, may cause load imbalance

schedule(dynamic[,chunk])Threads grab “chunk” iterations When done with iterations, thread requests next setWhen done with iterations, thread requests next setHigher threading overhead, can reduce load imbalance

schedule(guided[,chunk])schedule(guided[,chunk])Dynamic schedule starting with large block Size of the blocks shrink; no smaller than “chunk”

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 17: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Schedule Clause ExampleSchedule Clause Example#pragma omp parallel for schedule (static 8)#pragma omp parallel for schedule (static, 8)

for( int i = start; i <= end; i += 2 ){

if ( TestForPrime(i) ) gPrimesFound++;if ( TestForPrime(i) ) gPrimesFound++;}

Iterations are divided into chunks of 8If start = 3 then first chunk isIf start 3, then first chunk is

i={3,5,7,9,11,13,15,17}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 18: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Activity 2b –Mandelbrot SchedulingActivity 2b Mandelbrot Scheduling

Objective: create a parallel version of mandelbrot. That uses OpenMP dynamic schedulingschedulingFollow the next Mandelbrot activity called Mandelbrot Scheduling in the student glab doc

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 19: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

AgendaAgenda

What is OpenMP?Parallel regionsWorksharing Parallel SectionsWorksharing – Parallel SectionsData environment SynchronizationOptional Advanced topicsOptional Advanced topics

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 20: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Task DecompositionTask Decompositiona = alice();b b b()b = bob();s = boss(a, b);c = cy();

alice bobc = cy();printf ("%6.2f\n",

bigboss(s,c)); boss cyg

li b b d

boss cy

alice,bob, and cy can be computed i ll l

bigboss

in parallel

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 21: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

omp sectionsomp sections#pragma omp sections#pragma omp sectionsMust be inside a parallel regionPrecedes a code block containing of N blocks of code gthat may be executed concurrently by N threadsEncompasses each omp section

#pragma omp sectionPrecedes each block of code within the encompassing p gblock described aboveMay be omitted for first parallel section after the parallel sections pragmasections pragmaEnclosed program segments are distributed for parallel execution among available threads

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 22: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Functional Level Parallelism w sectionsFunctional Level Parallelism w sections#pragma omp parallel sections{ #pragma omp section /* Optional */

a = alice();a = alice();#pragma omp section

b = bob();#pragma omp section

c = cy();}}

s = boss(a, b);\printf ("%6.2f\n",

bigboss(s,c));

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 23: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Advantage of Parallel SectionsAdvantage of Parallel SectionsIndependent sections of codeIndependent sections of code can execute concurrently –reduce execution timereduce execution time

#pragma omp parallel sections#pragma omp parallel sections

{

#pragma omp section

phase1();

#pragma omp section

phase2();

Serial Parallelphase2();

#pragma omp section

phase3();

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

}

Page 24: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

AgendaAgenda

What is OpenMP?P ll l iParallel regionsWorksharing – TasksgData environment SynchronizationSynchronizationOptional Advanced topicsp p

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 25: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

New Addition to OpenMPNew Addition to OpenMP

Tasks – Main change for OpenMP 3 0OpenMP 3.0Allows parallelization of irregularAllows parallelization of irregular problems

unbounded loopsrecursive algorithmsrecursive algorithmsproducer/consumer

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

p

Page 26: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

What are tasks?What are tasks?T k i d d t it f kTasks are independent units of workThreads are assigned to perform the work of each taskwork of each task

Tasks may be deferred

Tasks may be executed immediatelyTasks may be executed immediatelyThe runtime system decides which of the abovethe above

Tasks are composed of:code to execute Serial Paralleldata environmentinternal control variables (ICV)

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 27: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Simple Task ExampleSimple Task Example#pragma omp parallel// assume 8 threads A pool of 8 threads is

created here

// assume 8 threads{#pragma omp single private(p){…while (p) {

One thread gets to execute the while loop

#pragma omp task{processwork(p);

The single “while loop” thread creates a task for p (p);

}p = p->next;

}

each instance of processwork()

}}

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 28: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Task Construct – Explicit Task ViewTask Construct – Explicit Task View

A team of threads is created at the omp parallel constructA single thread is chosen to execute the while loop lets

#pragma omp parallel{

execute the while loop – lets call this thread “L”Thread L operates the while loop, creates tasks, and f

#pragma omp single{ // block 1

node * p = head;fetches next pointersEach time L crosses the omp task construct it generates a new task and has a thread

while (p) { //block 2#pragma omp task private(p)

process(p);new task and has a thread assigned to itEach task runs in its own threadAll t k l t t th

process(p);p = p->next; //block 3}

}All tasks complete at the barrier at the end of the parallel region’s single construct

}}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 29: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Why are tasks useful?Why are tasks useful?Have potential to parallelize irregular patterns and recursive function c

#pragma omp parallel{

#pragma omp singleBlock

Single Threaded

Block

Thr1 Thr2 Thr3 Thr4

#pragma omp single{ // block 1

node * p = head;hil ( ) { //bl k 2

1Block 2Task 1

1Block 3Block 3 Block 2

k 2

Block 2Task 1

Bl k 2while (p) { //block 2#pragma omp task

process(p); Block 2T k 2

Block 3

Task 2 Block 2Task 3

Idle

Idle

p = p->next; //block 3}

}

Task 2 Tim

e}

}

Block 2

Block 3

Time Saved

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Task 3

Page 30: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Activity 3 – Linked List using TasksActivity 3 Linked List using Tasks

Objective: Modify the linked list pointer chasing code to implement tasks tochasing code to implement tasks to parallelize the applicationF ll th Li k d Li t t k ti it ll dFollow the Linked List task activity called LinkedListTask in the student lab doc

while(p != NULL){do_work(p->data);p = p->next;

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 31: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

When are tasks gauranteed to be complete?

Tasks are gauranteed to be complete:

When are tasks gauranteed to be complete?

Tasks are gauranteed to be complete:At thread or task barriers

At the directive: #pragma omp barrierAt the directive: #pragma omp taskwaitAt the directive: #pragma omp taskwait

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 32: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Task Completion Examplep p#pragma omp parallel

Multiple foo tasks created here – one for#pragma omp parallel

{#pragma omp task

created here one for each thread

foo();#pragma omp barrier# i l

All foo tasks guaranteed to be completed here

#pragma omp single{

#pragma omp task One bar task created #pragma omp taskbar();

}

here

} bar task guaranteed to be completed here

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 33: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

AgendaAgenda

What is OpenMP?Parallel regionsParallel regionsWorksharingData environment SynchronizationSynchronizationOptional Advanced topics

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 34: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Data Scoping – What’s sharedData Scoping What s sharedOpenMP uses a shared-memory programmingOpenMP uses a shared memory programming modelShared variable a variable that can be read orShared variable - a variable that can be read or written by multiple threadsSh d l b d t k itShared clause can be used to make items explicitly shared

Global variables are shared by default among tasksFile scope variables, namespace scope variables, static variables Variables with const-qualified type having novariables, Variables with const qualified type having no mutable member are shared, Static variables which are declared in a scope inside the construct are shared

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 35: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Data Scoping – What’s privateData Scoping What s privateBut, not everything is shared...

Examples of implicitly determined private variables:Stack (local) variables in functions called from parallel regions areStack (local) variables in functions called from parallel regions are PRIVATEAutomatic variables within a statement block are PRIVATELoop iteration variables are privateImplicitly declared private variables within tasks will be treated as firstprivate

Firstprivate clause declares one or more list items to be private to a task, and initializes each of them with a value

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 36: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

A Data Environment ExampleA Data Environment Examplefloat A[10];main ()

extern float A[10];void Work (int *index)()

{integer index[10];#pragma omp parallel

( ){

float temp[10];static integer count;p g p p

{Work (index);

}

g ;<...>

}

A, index, count

}printf (“%d\n”, index[1]);

}

temptemp tempWhich variables are shared and which variables are private?A, index, and count are shared by all threads, but temp is

A, index, count

local to each thread

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 37: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Data Scoping Issue – fib exampleint fib ( int n ){

Data Scoping Issue fib example

{

int x,y;( )

n is private in both tasks

if ( n < 2 ) return n;#pragma omp task

x = fib(n-1);x is a private variabley is a private variable

#pragma omp tasky = fib(n-2);

#pragma omp taskwait#pragma omp taskwaitreturn x+y

} What’s wrong here?

Can’t use private variables outside of tasks

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

outside of tasks

Page 38: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Data Scoping Example – fib exampleint fib ( int n )

Data Scoping Example fib example

{

int x,y; n is private in both tasks

if ( n < 2 ) return n;#pragma omp task shared(x)

x = fib(n-1);x fib(n 1);#pragma omp task shared(y)

y = fib(n-2);#pragma omp taskwait#pragma omp taskwait

return x+y;}

x & y are shared Good solution

d b h lwe need both values to compute the sum

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 39: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Data Scoping Issue – List TraversalList ml; //my list

Data Scoping Issue List Traversal; // y_

Element *e;#pragma omp parallel#pragma omp single

What’s wrong here?

#pragma omp single{

for(e=ml->first;e;e=e->next)#pragma omp task#pragma omp task

process(e);}

Possible data race !Shared variable e

updated by multiple tasks

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 40: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Data Scoping Example – List Traversal

List ml; //my list

Data Scoping Example List Traversal

; // y_Element *e;#pragma omp parallel#pragma omp single

Good solution – e is firstprivate

#pragma omp single{

for(e=ml->first;e;e=e->next)#pragma omp task firstprivate(e)#pragma omp task firstprivate(e)

process(e);}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 41: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Data Scoping Example – List Traversal

List ml; //my list

p g p

; // y_Element *e;#pragma omp parallel#pragma omp single private(e)

Good solution – e is private

#pragma omp single private(e){

for(e=ml->first;e;e=e->next)#pragma omp task#pragma omp task

process(e);}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 42: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

SData Scoping Example – List Traversal

List ml; //my_list#pragma omp parallel{{

Element *e;for(e=ml->first;e;e=e->next)

#pragma omp task#pragma omp taskprocess(e);

}

Good solution – e is private

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

private

Page 43: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

AgendaAgenda

What is OpenMP?Parallel regionsParallel regionsWorksharingData environment SynchronizationSynchronizationOptional Advanced topics

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 44: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Example: Dot ProductExample: Dot Product

float dot_prod(float* a, float* b, int N) {float sum = 0 0;float sum = 0.0;

#pragma omp parallel for shared(sum)for(int i=0; i<N; i++) {

sum += a[i] * b[i];sum += a[i] * b[i];}

return sum;}}

Wh t i W ?What is Wrong?

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 45: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Race ConditionRace Condition

A race condition is nondeterministic behavior caused by the times at which ytwo or more threads access a shared variablevariableFor example, suppose both Thread A and Th d B ti th t t tThread B are executing the statement

area += 4.0 / (1.0 + x*x);/ ( );

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 46: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Two TimingsTwo Timings

Value of area

Thread A Thread B

11.667

Value of area

Thread A Thread B

11.667

+3.765

11.667

+3.765

15.432

15.432

11.667

15.432

+ 3.563 + 3.563

18.995 15.230

Order of thread execution causes d t i t b h i i d t

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

non determinant behavior in a data race

Page 47: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Protect Shared DataProtect Shared Data

Must protect access to shared, modifiable data float dot_prod(float* a, float* b, int N) {{float sum = 0.0;

#pragma omp parallel for shared(sum)for(int i=0; i<N; i++) {for(int i 0; i<N; i++) {

#pragma omp criticalsum += a[i] * b[i];

}}return sum;

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 48: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

OpenMP* Critical Construct#pragma omp critical [(lock_name)]

OpenMP Critical Construct_

Defines a critical region on a structured block

float RES;#pragma omp parallel{ float B; #

Threads wait their turn –only one at a time calls consum() thereby #pragma omp for

for(int i=0; i<niters; i++){B = big_job(i);

# iti l (RES l k)

consum() thereby protecting RES from race conditions

#pragma omp critical (RES_lock)consum (B, RES);

}}

Naming the critical construct RES_lock is optional

}

Good Practice – Name all critical sections

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 49: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

OpenMP* Reduction Clausepreduction (op : list)The variables in “list” must be shared in the enclosing parallel regiong p gInside parallel or work-sharing construct:

A PRIVATE copy of each list variable is createdA PRIVATE copy of each list variable is created and initialized depending on the “op”

Th i d t d l ll b th dThese copies are updated locally by threads

At end of construct, local copies are combinedAt end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 50: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Reduction Examplep

#pragma omp parallel for reduction(+:sum)for(i=0; i<N; i++) {

[i] * b[i]sum += a[i] * b[i];}

Local copy of sum for each threadAll local copies of sum added together andAll local copies of sum added together and stored in “global” variable

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 51: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Numerical Integration ExampleNumerical Integration Example4.0

4.0∫ 4.01

4.0(1+x2)

f(x) =∫ 4.0(1+x2) dx = π

0

static long num_steps=100000; double step, pi;

void main()

2.0

void main(){ int i;

double x, sum = 0.0;

step = 1.0/(double) num_steps;for (i=0; i< num_steps; i++){

x = (i+0.5)*step;sum = sum + 4.0/(1.0 + x*x);sum sum + 4.0/(1.0 + x x);

}pi = step * sum;printf(“Pi = %f\n”,pi);

1.00.0 X }

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 52: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

C/C++ Reduction OperationsA range of associative operands can be

C/C++ Reduction Operationsa ge o assoc at e ope a ds ca be

used with reductionInitial values are the ones that make senseInitial values are the ones that make sense mathematically

Operand Initial Value+ 0

Operand Initial Value& ~0+ 0

* 1

& 0

| 0- 0

^ 0&& 1

|| 0

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

|| 0

Page 53: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Activity 4 - Computing Piy p gParallelize the static long num_steps=100000; numerical integration code

i O MP

_double step, pi;

void main()using OpenMPWhat variables can b h d?

{ int i;double x, sum = 0.0;

1 0/(d bl ) be shared?What variables

d t b i t ?

step = 1.0/(double) num_steps;for (i=0; i< num_steps; i++){

x = (i+0.5)*step;+ 4 0/(1 0 + * ) need to be private?

What variables h ld b t f

sum = sum + 4.0/(1.0 + x*x);}pi = step * sum;printf(“Pi = %f\n” pi); should be set up for

reductions?

printf( Pi = %f\n ,pi);}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 54: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Single Construct

Denotes block of code to be executed

Single Construct

Denotes block of code to be executed by only one thread

Fi h d i i hFirst thread to arrive is chosen

Implicit barrier at end#pragma omp parallel{

DoManyThings();y g ();#pragma omp single

{ExchangeBoundaries();g

} // threads wait here for singleDoManyMoreThings();

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 55: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Master ConstructMaster Construct

Denotes block of code to be executed only by the master thready yNo implicit barrier at end

#pragma omp parallel{

DoManyThings();DoManyThings();#pragma omp master

{ // if not master skip to next stmtExchangeBoundaries();ExchangeBoundaries();

}DoManyMoreThings();

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

}

Page 56: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Implicit BarriersImplicit BarriersSeveral OpenMP* constructs have implicitSeveral OpenMP constructs have implicit barriers

Parallel necessary barrier cannot be removedParallel – necessary barrier – cannot be removedforsinglesingle

Unnecessary barriers hurt performance and can b d ith th it lbe removed with the nowait clause

The nowait clause is applicable to:For clauseSingle clause

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 57: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Nowait ClauseNowait Clause#pragma single nowait#pragma omp for nowait{ [...] }for(...)

{...};

Use when threads unnecessarily wait between independent computations

#pragma omp for schedule(dynamic,1) nowaitfor(int i=0; i<n; i++)a[i] = bigFunc1(i);

#pragma omp for schedule(dynamic,1) for(int j=0; j<m; j++)b[j] = bigFunc2(j);

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

b[j] bigFunc2(j);

Page 58: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Barrier ConstructBarrier Construct

Explicit barrier synchronizationEach thread waits until all threads arrive

# ll l h d ( C)#pragma omp parallel shared (A, B, C) {

DoSomeWork(A,B); // Processed A into B

#pragma omp barrier

DoSomeWork(B,C); // Processed B into C

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 59: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Atomic ConstructAtomic Construct

Special case of a critical section Applies only to simple update of memory locationlocation#pragma omp parallel for shared(x, y, index, n)

for (i = 0; i < n; i++) {for (i = 0; i < n; i++) {#pragma omp atomicx[index[i]] += work1(i);

y[i] += work2(i);y[i] += work2(i);}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 60: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Agendag

What is OpenMP?Parallel regionsWorksharingWorksharingData environment SynchronizationOptional Advanced topicsOptional Advanced topics

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 61: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

AdvancedAdvanced ConceptsCo cep s

Page 62: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Parallel Construct – Implicit Task ViewpTasks are created in OpenMP even without an #pragma omp parallel

explicit task directive.Lets look at how tasks are created implicitly for the

{ mydata

d

{ mydata

d

{ mydata

d

Thread

1Thread

2Thread

3created implicitly for the code snippet below

Thread encountering parallelconstruct packages up a set of

code}

code}

code}1 2 3

B iconstruct packages up a set of implicit tasksTeam of threads is created.

Barrier

Each thread in team is assigned to one of the tasks (and tied to it).Barrier holds original master

#pragma omp parallel{ int mydata{

thread until all implicit tasks are finished.

code}

int mydata;code…

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 63: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Task ConstructTask Construct#pragma omp task [clause[[ ]clause] ]#pragma omp task [clause[[,]clause] ...]

structured-block

where clause can be one of:

if (expression) untiedshared (list)private (list) firstprivate (list)default( shared | none )

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 64: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Tied & Untied TasksTied & Untied TasksTied Tasks:

A tied task gets a thread assigned to it at its first execution andA tied task gets a thread assigned to it at its first execution and the same thread services the task for its lifetimeA thread executing a tied task, can be suspended, and sent of to execute some other task but eventually the same thread willexecute some other task, but eventually, the same thread will return to resume execution of its original tied taskTasks are tied unless explicitly declared untied

Untied Tasks:An united task has no long term association with any given g y gthread. Any thread not otherwise occupied is free to execute an untied task. The thread assigned to execute an untied task may only change at a "task scheduling point".An untied task is created by appending “untied” to the task clauseExample: #pragma omp task untied

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Example: #pragma omp task untied

Page 65: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Task switchingTask switchingtask switching The act of a thread switching from the execution of

t k t th t kone task to another task.

• The purpose of task switching is distribute threads among the i d t k i th t t id ili l funassigned tasks in the team to avoid piling up long queues of

unassigned tasks

Task switching for tied tasks can only occur at task scheduling• Task switching, for tied tasks, can only occur at task scheduling points located within the following constructs

• encountered task constructs• encountered task constructs• encountered taskwait constructs• encountered barrier directives• implicit barrier regions• at the end of the tied task region

• Untied tasks have implementation dependent scheduling points

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 66: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Task switching exampleThe thread executing the “for loop” , AKA the generating task, generates many tasks in a short

Task switching exampleg g g ytime so...The SINGLE generating task will have to suspend for a while when “task pool” fills upsuspend for a while when task pool fills up

Task switching is invoked to start draining the “pool”• When “pool” is sufficiently drained – then the single

task can being generating more tasks again#pragma omp single{{

for (i=0; i<ONEZILLION; i++)#pragma omp task

process(item[i]);process(item[i]);}

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 67: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Optional foil - OpenMP* API

Get the thread number within a team

Optional foil OpenMP API

Get the thread number within a teamint omp_get_thread_num(void);

Get the number of threads in a teamint omp_get_num_threads(void);

Usually not needed for OpenMP codesCan lead to code not being serially consistentDoes have specific uses (debugging)Must include a header file

#i l d < h>#include <omp.h>

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 68: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Optional foil - Monte Carlo PiOptional foil Monte Carlo Pi4

1

idfcircle hitting darts of#

2

2= πr

circle hitting darts of#4

squarein dartsof# 2

r

squarein dartsof #

loop 1 to MAXx.coor=(random#)y.coor=(random#)dist=sqrt(x^2 + y^2) if (dist <= 1)

hits hits+1

r

hits=hits+1

pi = 4 * hits/MAX

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

p /

Page 69: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Optional foil - Making MonteOptional foil Making Monte Carlo’s Parallel

hits = 0call SEED48(1)DO I = 1 maxDO I = 1, max

x = DRAND48()y = DRAND48()IF (SQRT(x*x + y*y) .LT. 1)

THENhits = hits+1

ENDIFEND DOpi = REAL(hits)/REAL(max) * 4 0pi = REAL(hits)/REAL(max) * 4.0

What is the challenge here?Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

g

Page 70: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Optional Activity 5: Computing PiOpt o a ct ty 5 Co put gUse the Intel® Math Kernel Library (Intel® MKL) VSL:VSL:

Intel MKL’s VSL (Vector Statistics Libraries)

VSL creates an array, rather than a single random number

VSL can have multiple seeds (one for each thread)

Objective:Objective:Use basic OpenMP* syntax to make Pi parallel

Choose the best code to divide the task up

C t i l ll i bl

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Categorize properly all variables

Page 71: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Firstprivate ClauseVariables initialized from shared variable

Firstprivate ClauseVariables initialized from shared variableC++ objects are copy-constructed

incr=0;# ll l f fi t i t (i )#pragma omp parallel for firstprivate(incr)for (I=0;I<=MAX;I++) {

if ((I%2)==0) incr++;A(I)=incr;

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 72: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Lastprivate ClauseVariables update shared variable using value

Lastprivate ClauseVariables update shared variable using value from last iteration C++ objects are updated as if by assignmentC++ objects are updated as if by assignment

void sq2(int n, d bl *l )double *lastterm)

{double x; int i;#pragma omp parallel#pragma omp parallel #pragma omp for lastprivate(x)for (i = 0; i < n; i++){

x = a[i]*a[i] + b[i]*b[i];[ ] [ ] [ ] [ ]b[i] = sqrt(x);

}lastterm = x;

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

}

Page 73: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Preserves global scope for per-threadThreadprivate Clause

Preserves global scope for per-thread storageLegal for name-space-scope and file-scopeUse copyin to initialize from master threadstruct Astruct A;#pragma omp threadprivate(A)Use copyin to initialize from master thread#pragma omp threadprivate(A)…

#pragma omp parallel copyin(A)Private copies of “A” persist between#pragma omp parallel copyin(A)

do_something_to(&A);…

# ll l

persist between regions

#pragma omp paralleldo_something_else_to(&A);

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

Page 74: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

20+ Library Routines20 Library RoutinesRuntime environment routines:

Modify/check the number of threadsomp_[set|get]_num_threads()omp_get_thread_num()omp_get_max_threads()

A i ll l i ?Are we in a parallel region?omp_in_parallel()

H i th t ?How many processors in the system?omp_get_num_procs()

E li it l kExplicit locksomp_[set|unset]_lock()

And many more

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

And many more...

Page 75: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

Library RoutinesLibrary RoutinesTo fix the number of threads used in a program p g

Set the number of threadsThen save the number returned

#include <omp h>

Request as many threads as you have processors.

#include <omp.h>

void main (){

int num threads;int num_threads;omp_set_num_threads (omp_num_procs ());

#pragma omp parallel{

Protect this operation because memory stores are

t t i{int id = omp_get_thread_num ();

#pragma omp singlenum threads = omp get num threads ();

not atomic

num_threads = omp_get_num_threads ();

do_lots_of_stuff (id);}

}

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES

}

Page 76: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of
Page 77: Programming withProgramming with OpenMP* · omp sectionsomp sections #pragma omp sections#pragma omp sections Must be inside a parallel region Precedes a code block containingg of

BACKUPBACKUP

Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SOFTWARE AND SERVICES