tutorial presentation 8
DESCRIPTION
LIOH OOITRANSCRIPT
Introduction
I Chip manufacturers are rapidly moving to multi-coreCPUs
Figure : Quad-core processor Intel Sandy Bridge
Shared Memory ModelI All processors can access all memory in global address
space.I Threads Model: A single process can have multiple,
concurrent execution pathsI On a multi-core system, the threads run at the same
time, with each core running a particular thread or task.
Figure : Shared Memory Model [1]
What is OpenMP?
I An Application Program Interface (API)I Used to explicitly direct multi-threaded, shared memory
parallelismI Provides a portable, scalable modelI Supports C/C++ and Fortran on a wide variety of
architectures
Fork-Join Model
I OpenMP-program starts as a single threadI Additional threads (Team) are created when the master
hits a parallel regionI When all threads finished the parallel region, the new
threads are given back to the runtime or operatingsystem.
I The master continues after the parallel region
Fork-Join Model (cont.)
Figure : Fork-Join Model [1]
OpenMP API
Primary API components:I Compiler Directives:
#pragma omp p a r a l l e l
I Run-time Library Routines:
i n t omp_get_num_threads ( v o i d ) ;
I Environment Variables
e x p o r t OMP_NUM_THREADS=2
Example
Listing 1: OpenMP Hello World!#i n c l u d e <ios t r eam>#i n c l u d e <omp . h>
i n t main ( i n t argc , cha r ∗ a rgv [ ] ){
#pragma omp p a r a l l e l{
s t d : : cout << "THREAD: " << omp_get_thread_num ( ) << "\ tHe l l o , World ! \ n" ;}r e t u r n 0 ;
}
Listing 2: Compilingg++ −o h e l l o h e l l o . c −fopenmp
Classification of Variables
I private(var-list):I Variables in var-list are private
I shared(var-list):I Variables in var-list are shared.
I default(private | shared | none):I Sets the default for all variables in this region.
Example
Listing 3: OpenMP Private Variable#i n c l u d e <ios t r eam>#i n c l u d e <omp . h>i n t main ( i n t argc , cha r ∗ a rgv [ ] ){
i n t i , j ;i = 1 ;j = 2 ;
s t d : : cout << "BEFORE: i , j= "<< i << " , " << j << std : : e nd l ;
#pragma omp p a r a l l e l p r i v a t e ( i ){
i = 3 ;j = 5 ;s t d : : cout << "IN−LOOP: i , j= "<< i << " , " << j << std : : e nd l ;
}
s t d : : cout << "AFTER: i , j= "<< i << " , " << j << std : : e nd l ;r e t u r n 0 ;
}
Work-Sharing Constructs
I Work-sharing constructs distribute the specified work toall threads within the current team
I Types:I Parallel loopI Parallel sectionI Master regionI Single region
Parallel Loop
I Syntax:
#pragma omp f o r [ c l a u s e . . . ]
I The iterations of the loop are distributed to the threadsI The scheduling of loop iterations: static, dynamic,
guided, and runtime.
Scheduling Strategies
I Schedule clause:
s c h edu l e ( type [ , s i z e ] )
I static: Chunks of the specified size are assigned in around- robin fashion to the threads.
I dynamic: The iterations are broken into chunks of thespecified size. When a thread finishes the execution of achunk, the next chunk is assigned to that thread.
I guided: Similar to dynamic, but the size of the chunks isexponentially decreasing. The size parameter specifies thesmallest chunk. The initial chunk is implementationdependent.
I runtime: The scheduling type and the chunk size isdetermined via environment variables.
Example
Listing 4: OpenMP Private Variable#i n c l u d e <ios t r eam>#i n c l u d e <omp . h>
#d e f i n e CHUNKSIZE 100#d e f i n e N 1000i n t main ( ){
i n t i , chunk ;doub l e a [N] , b [N] , c [N ] ;s r and ( t ime (NULL) ) ;f o r ( i =0; i < N; i++) {
a [ i ] = generate_random_double ( 0 . 0 , 1 0 . 0 ) ;b [ i ] = generate_random_double ( 0 . 0 , 1 0 . 0 ) ;
}chunk = CHUNKSIZE ;
#pragma omp p a r a l l e l s ha r ed ( a , b , c , chunk ) p r i v a t e ( i ){
#pragma omp f o r s c h edu l e ( dynamic , chunk ) nowa i tf o r ( i =0; i < N; i++)
c [ i ] = a [ i ] + b [ i ] ;
}r e t u r n 0 ;
}
References
Blaise Barney, Lawrence Livermore National Laboratory,https://computing.llnl.gov/tutorials/openMP/