computational methods in physics phys 3437
DESCRIPTION
Computational Methods in Physics PHYS 3437. Dr Rob Thacker Dept of Astronomy & Physics (MM-301C) [email protected]. Today’s Lecture. Recap from end of last lecture Some technical details related to parallel programming Data dependencies Race conditions - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/1.jpg)
Computational Computational Methods in PhysicsMethods in Physics
PHYS 3437 PHYS 3437Dr Rob ThackerDr Rob Thacker
Dept of Astronomy & Physics Dept of Astronomy & Physics (MM-301C)(MM-301C)
[email protected]@ap.smu.ca
![Page 2: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/2.jpg)
Today’s LectureToday’s Lecture
Recap from end of last lectureRecap from end of last lecture Some technical details related to Some technical details related to
parallel programmingparallel programming Data dependenciesData dependencies Race conditionsRace conditions
Summary of other clauses you can Summary of other clauses you can use in setting up parallel loopsuse in setting up parallel loops
![Page 3: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/3.jpg)
RecapRecap
C$OMP PARALLEL DOC$OMP& DEFAULT(NONE)C$OMP& PRIVATE(i),SHARED(X,Y,n,a) do i=1,n Y(i)=a*X(i)+Y(i) end do
Denotes this is a region of code for parallel execution
Good programming practice, mustdeclare nature of all variables
Thread PRIVATE variables: each threadmust have their own copy of this variable (in this case i is the only private variable)
Thread SHARED variables: all threads can
access these variables, but must not updateindividual memory locations simultaneously
Comment pragmas for FORTRAN - ampersand
necessary for continuation
![Page 4: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/4.jpg)
SHAREDSHARED and and PRIVATEPRIVATE Most commonly used directives which are Most commonly used directives which are
necessary to ensure correct executionnecessary to ensure correct execution PRIVATEPRIVATE: any variable declared as private will : any variable declared as private will
be local only to a given thread and is be local only to a given thread and is inaccesible to others (also it is inaccesible to others (also it is uninitializeduninitialized)) This means that if you have a variable, say This means that if you have a variable, say tt, in the , in the
serial section of the code and then use it in a loop, serial section of the code and then use it in a loop, the value of the value of tt in the loop will not carry over the in the loop will not carry over the value of value of tt from the serial part from the serial part
Watch out for this – but there is a way around it…Watch out for this – but there is a way around it…
SHAREDSHARED: any variable declared as shared will : any variable declared as shared will be accessible by all other threads of executionbe accessible by all other threads of execution
![Page 5: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/5.jpg)
ExampleExample
The SHARED and PRIVATE The SHARED and PRIVATE specifications can be long!specifications can be long!C$OMP& PRIVATE(icb,icol,izt,iyt,icell,iz_off,iy_off,ibz,C$OMP& iby,ibx,i,rxadd,ryadd,rzadd,inx,iny,inz,nb,nebs,ibrf,C$OMP& nbz,nby,nbx,nbrf,nbref,jnbox,jnboxnhc,idt,mdt,iboxd,C$OMP& dedge,idir,redge,is,ie,twoh,dosph,rmind,in,ixyz,C$OMP& redaughter,Ustmp,ngpp,hpp,vpp,apps,epp,hppi,hpp2,C$OMP& rh2,hpp2i,hpp3i,hpp5i,dpp,divpp,dcvpp,nspp,rnspp,C$OMP& rad2torbin,de1,dosphflag,dosphnb,nbzlow,nbzhigh,nbylow,C$OMP& nbyhigh,nbxlow,nbxhigh,nbzadd,nbyadd,r3i,r2i,r1i,C$OMP& dosphnbnb,dogravnb,js,je,j,rad2,rmj,grc,igrc,gfrac,C$OMP& Gr,hppj,jlist,dx,rdv,rcv,v2,radii2,rbin,ibin,fbin,C$OMP& wl1,dwl1,drnspp,hppa,hppji,hppj2i,hppj3i,hppj5i,C$OMP& wl2,dwl2,w,dw,df,dppi,divppr,dcvpp2,dcvppm,divppm,csi,C$OMP& fi,prhoi2,ispp,frcij,rdotv,hpa,rmuij,rhoij,cij,qij,C$OMP& frc3,frc4,hcalc,rath,av,frc2,dr1,dr2,dr3,dr12,dr22,dr32,C$OMP& appg1,appg2,appg3,gdiff,ddiff,d2diff,dv1,dv2,dv3,rpp,C$OMP& Gro)
![Page 6: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/6.jpg)
FIRSTPRIVATEFIRSTPRIVATE
Declaring a variable Declaring a variable FIRSTPRIVATEFIRSTPRIVATE will will ensure that its value is copied in from ensure that its value is copied in from any prior piece of serial codeany prior piece of serial code However (of course) if the variable is not However (of course) if the variable is not
initialized in the serial section it will remain initialized in the serial section it will remain uninitializeduninitialized
Happens only once for a given thread Happens only once for a given thread setset
Try to avoid writing to variables Try to avoid writing to variables declareddeclared FIRSTPRIVATE FIRSTPRIVATE
![Page 7: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/7.jpg)
FIRSTPRIVATEFIRSTPRIVATE example example
Lower bound of values is set to value of A, Lower bound of values is set to value of A, without without FIRSTPRIVATEFIRSTPRIVATE clause a=0.0 clause a=0.0
a=5.0C$OMP PARALLEL DOC$OMP& SHARED(r), PRIVATE(i)C$OMP& FIRSTPRIVATE(a) do i=1,n r(i)=max(a,r(i)) end do
![Page 8: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/8.jpg)
LASTPRIVATELASTPRIVATE
Occasionally it may be necessary to Occasionally it may be necessary to know the last value of a variable know the last value of a variable from the end of the loopfrom the end of the loop
LASTPRIVATELASTPRIVATE variables will initialize variables will initialize the value of the variable in the serial the value of the variable in the serial section using the last (section using the last (sequentialsequential) ) value of the variable from the value of the variable from the parallel loopparallel loop
![Page 9: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/9.jpg)
Default behaviourDefault behaviour
You can actually omit the You can actually omit the SHAREDSHARED and and PRIVATEPRIVATE statements – what is statements – what is the expected behaviour?the expected behaviour?
Scalars are private by defaultScalars are private by default Arrays are shared by defaultArrays are shared by default
Bad practice in my opinion – specify the types for everything
![Page 10: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/10.jpg)
DEFAULTDEFAULT
I recommend using I recommend using DEFAULT(NONE)DEFAULT(NONE) at all at all timestimes
Forces specification of all variable typesForces specification of all variable types Alternatively, can use Alternatively, can use DEFAULT(SHARED)DEFAULT(SHARED), ,
or or DEFAULT(PRIVATE)DEFAULT(PRIVATE) to specify that un- to specify that un-scoped variables will default to the scoped variables will default to the particular type chosen particular type chosen
e.g.e.g. choosing choosing DEFAULT(PRIVATE)DEFAULT(PRIVATE) will will ensure any un-scoped variable is private ensure any un-scoped variable is private
![Page 11: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/11.jpg)
The Parallel Do PragmasThe Parallel Do Pragmas
So far we’ve considered a small So far we’ve considered a small subset of functionalitysubset of functionality
Before we talk more about data Before we talk more about data dependencies, lets look briefly at dependencies, lets look briefly at what other statements can be used what other statements can be used in a parallel do loopin a parallel do loop
Besides Besides PRIVATEPRIVATE and and SHAREDSHARED variables there are a number of variables there are a number of other clauses that can be appliedother clauses that can be applied
![Page 12: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/12.jpg)
Loop Level Parallelism in Loop Level Parallelism in more detailmore detail
For each parallel do(for) pragma, For each parallel do(for) pragma, the following clauses are possible:the following clauses are possible:
FORTRANPRIVATESHAREDFIRSTPRIVATELASTPRIVATEREDUCTIONORDEREDSCHEDULECOPYINDEFAULT
C/C++privatesharedfirstprivatelastprivatereductionorderedschedulecopyin
Red=most frequently usedClauses in italics we have already seen.
![Page 13: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/13.jpg)
More background on data More background on data dependenciesdependencies
Suppose you try to parallelize the following Suppose you try to parallelize the following looploop
Won’t work as it is written since iteration Won’t work as it is written since iteration ii, depends upon iteration , depends upon iteration i-1i-1 and thus we and thus we can’t start anything in parallelcan’t start anything in parallel
To see this explicitly, let n=20 and start To see this explicitly, let n=20 and start thread 1 at i=1, and thread 2 at i=11 then thread 1 at i=1, and thread 2 at i=11 then thread 1 sets Y(1)=1.0 and thread 2 sets thread 1 sets Y(1)=1.0 and thread 2 sets Y(11)=1.0 (which is wrong!)Y(11)=1.0 (which is wrong!)
c=0.0do i=1,n c=c+1.0 Y(i)=cend do
![Page 14: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/14.jpg)
Simple solutionSimple solution
This loop can easily be re-written in a This loop can easily be re-written in a way that can be parallelized:way that can be parallelized:
There is no longer any dependence on There is no longer any dependence on the previous operationthe previous operation
Private variables: Private variables: ii, Shared variables: , Shared variables: Y(),c,nY(),c,n
c=0.0do i=1,n Y(i)=c+float(i)end doc=c+n
![Page 15: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/15.jpg)
Types of Data Types of Data DependenciesDependencies
Suppose we have operations OSuppose we have operations O11,O,O22
True Dependence: True Dependence: OO22 has a true dependence on O has a true dependence on O11 if O if O22 reads a reads a
value written by Ovalue written by O11
Anti Dependence:Anti Dependence: OO22 has an anti-dependence on O has an anti-dependence on O11 if O if O22 writes a writes a
value read by Ovalue read by O11
Output Dependence:Output Dependence: OO22 has an output dependence on O has an output dependence on O11 if O if O22 writes a writes a
variable written by Ovariable written by O11
![Page 16: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/16.jpg)
ExamplesExamples
True dependence:True dependence:
Anti-dependence:Anti-dependence:
Output dependence:Output dependence:
A1=A2+A3B1=A1+B2
B1=A1+B2A1=C2
B1=5B1=2
![Page 17: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/17.jpg)
Dealing with Data Dealing with Data DependenciesDependencies
Any loop where iterations depend upon the Any loop where iterations depend upon the previous one has a previous one has a potentialpotential problem problem
Any result which depends upon the Any result which depends upon the orderorder of of the iterations will be a problemthe iterations will be a problem
Good first test of whether something can be Good first test of whether something can be parallelized: reverse the loop iteration orderparallelized: reverse the loop iteration order
Not all data dependencies can be eliminatedNot all data dependencies can be eliminated AccumulationsAccumulations of variables ( of variables (e.g.e.g. sum of sum of
elements in an array) can be dealt with easilyelements in an array) can be dealt with easily
![Page 18: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/18.jpg)
AccumulationsAccumulations
Consider the following loop:Consider the following loop:
It apparently has a data dependency – It apparently has a data dependency – however each thread can sum values of however each thread can sum values of aa independently independently
OpenMP provides an explicit interface OpenMP provides an explicit interface for this kind of operation (“for this kind of operation (“REDUCTIONREDUCTION”)”)
a=0.0do i=1,n a=a+X(i)end do
![Page 19: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/19.jpg)
REDUCTIONREDUCTION clause clause This clause deals with parallel versions of This clause deals with parallel versions of
the following loopsthe following loops
Outcome is determined by a `reduction’ Outcome is determined by a `reduction’ over all the values for each threadover all the values for each thread
e.g.e.g. max over all of a set, is equivalent to max over all of a set, is equivalent to the max over all max with subsets: the max over all max with subsets:
Max(A) where A=U AMax(A) where A=U Ann= Max(U = Max(U Max(AMax(Ann))))
do i=1,N a=max(a,b(i))end do
do i=1,N a=min(a,b(i))end do
do i=1,n a=a+b(i)end do
![Page 20: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/20.jpg)
ExamplesExamples
Syntax:Syntax: REDUCTION(OP: REDUCTION(OP: variablevariable)) where where OP=max,min,+,-,*OP=max,min,+,-,* (& logic (& logic ops)ops)
C$OMP PARALLEL DOC$OMP& PRIVATE(i), SHARED(b)C$OMP& REDUCTION(max:a)do i=1,N a=max(a,b(i))end do
C$OMP PARALLEL DOC$OMP& PRIVATE(i), SHARED(b)C$OMP& REDUCTION(min:a)do i=1,N a=min(a,b(i))end do
![Page 21: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/21.jpg)
What is REDUCTION What is REDUCTION actually doing?actually doing?
Saving you from writing more codeSaving you from writing more code The reduction clause generates an The reduction clause generates an
array of the reduction variables, and array of the reduction variables, and each thread is responsible for a each thread is responsible for a certain element in the arraycertain element in the array
The final reduction over all the array The final reduction over all the array elements (when the loop is finished) elements (when the loop is finished) is performed transparently to the is performed transparently to the useruser
![Page 22: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/22.jpg)
InitializationInitialization
Reduction variables are initialized as Reduction variables are initialized as follows (from the standard):follows (from the standard):
Operator Initialization+ 0* 1- 0MAX Smallest rep. numberMIN Largest rep. number
![Page 23: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/23.jpg)
Race ConditionsRace Conditions
Common operation is to resolve a Common operation is to resolve a spatial position into an array index: spatial position into an array index: consider following loopconsider following loop
Looks innocent enough – but suppose Looks innocent enough – but suppose two particles have the same positions…two particles have the same positions…
C$OMP PARALLEL DOC$OMP& DEFAULT(NONE)C$OMP& PRIVATE(i,j)C$OMP& SHARED(r,A) do i=1,n j=int(r(i)) A(j)=A(j)+1. end do
r(): array of positions
A(): array that is modifiedusing information fromr()
![Page 24: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/24.jpg)
Race Conditions: A Race Conditions: A concurrency problemconcurrency problem
Two different threads of execution Two different threads of execution can concurrently attempt to update can concurrently attempt to update the same memory locationthe same memory location
Thread 1: Puts A(j)=2.
Thread 2: Puts A(j)=2.
time
End State A(j)=2.
INCORRECT
StartA(j)=1.
Thread 1: Gets A(j)=1.Adds 1.A(j)=2.
Thread 2: Gets A(j)=1.Adds 1.A(j)=2.
![Page 25: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/25.jpg)
Dealing with Race Dealing with Race ConditionsConditions
Need mechanism to ensure updates to Need mechanism to ensure updates to single variables occur within a single variables occur within a critical critical sectionsection
Any thread entering a critical section Any thread entering a critical section blocks all othersblocks all others
Critical sections can be established by Critical sections can be established by using “lock variables”using “lock variables” Think of lock variables as preventing more than Think of lock variables as preventing more than
one thread from working on a particular piece of one thread from working on a particular piece of code at any one timecode at any one time
Just like a lock on door prevents people from entering Just like a lock on door prevents people from entering a rooma room
![Page 26: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/26.jpg)
Deadlocks: The pitfall of Deadlocks: The pitfall of lockinglocking
Must ensure a situation is not created Must ensure a situation is not created where requests in possession create a where requests in possession create a deadlock: deadlock:
Nested locks are a classic example of thisNested locks are a classic example of this Can also create problem with multiple Can also create problem with multiple
processes - `deadly embrace’processes - `deadly embrace’
Resource 1 Resource 2
Process 1 Process 2
Holds
Requests
![Page 27: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/27.jpg)
SolutionsSolutions
Need to ensure memory read/writes Need to ensure memory read/writes occur without any overlapoccur without any overlap
If the access occurs to a single If the access occurs to a single region, we can use a region, we can use a critical sectioncritical section::
do i=1,n **work**C$OMP CRITICAL(lckx) a=a+1.C$OMP END CRITICAL(lckx) end do
Only one thread will beallowed inside the criticalsection at a time.
I have given a name tothe critical section but youdon’t have to do this.
![Page 28: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/28.jpg)
ATOMICATOMIC
If all you want to do is ensure the If all you want to do is ensure the correct update of one variable you correct update of one variable you can use the atomic update facility:can use the atomic update facility:
Exactly the same as a critical section Exactly the same as a critical section around one single update pointaround one single update point
C$OMP PARALLEL DO do i=1,n **work**C$OMP ATOMIC a=a+1. end do
![Page 29: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/29.jpg)
CanCan be inefficient be inefficient
If other threads are If other threads are waiting to enter the waiting to enter the critical section then critical section then the program may even the program may even degenerate to a serial degenerate to a serial code!code!
Make sure there is Make sure there is much more work much more work outside the locked outside the locked region than inside it!region than inside it!
Parallel Section where eachthread waits for the lock before being able to proceed –A complete disaster
= doing work=waiting for lock
![Page 30: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/30.jpg)
COPYIN & ORDEREDCOPYIN & ORDERED Suppose you have a small section of code that Suppose you have a small section of code that
needs to be executed always in sequential orderneeds to be executed always in sequential order However, remaining work can be done in any However, remaining work can be done in any
orderorder Placing an Placing an ORDEREDORDERED clause around the work clause around the work
section will force threads to execute this section section will force threads to execute this section of code sequentiallyof code sequentially
If a common block is specified as private in a If a common block is specified as private in a parallel do, parallel do, COPYINCOPYIN will ensure that all threads will ensure that all threads are initialized with the same values as in the are initialized with the same values as in the serial section of the codeserial section of the code Essentially `Essentially `FIRSTPRIVATEFIRSTPRIVATE’ for common blocks/globals’ for common blocks/globals
![Page 31: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/31.jpg)
Subtle point about running Subtle point about running in parallelin parallel
When running in When running in parallel you are only parallel you are only as fast as your as fast as your slowest threadslowest thread
In example, total In example, total work is 40 seconds, & work is 40 seconds, & have 4 cpushave 4 cpus
Max speed up would Max speed up would be 40/4=10 secsbe 40/4=10 secs
All have to equal 10 All have to equal 10 secs though to give secs though to give max speed-upmax speed-up
0
2
4
6
8
10
12
14
16
Thread 1 Thread 2 Thread 3 Thread 4
Work
Example of poor load balance, only a40/16=2.5 speed-up despite using 4 processors
![Page 32: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/32.jpg)
SCHEDULESCHEDULE
This is the mechanism for determining This is the mechanism for determining how work is spread among threadshow work is spread among threads
Important for ensuring that work is Important for ensuring that work is spread evenly among the threads – spread evenly among the threads – just having the same number of each just having the same number of each iterations may not guarantee they all iterations may not guarantee they all complete at the same timecomplete at the same time
Four types of scheduling possible: Four types of scheduling possible: STATIC, DYNAMIC, GUIDED, RUNTIMESTATIC, DYNAMIC, GUIDED, RUNTIME
![Page 33: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/33.jpg)
STATICSTATIC scheduling scheduling
Simplest of the fourSimplest of the four If If SCHEDULESCHEDULE is unspecified, is unspecified, STATICSTATIC
scheduling will resultscheduling will result Default behaviour is to simply divide Default behaviour is to simply divide
up the iterations among the threads up the iterations among the threads ~n/(# threads)~n/(# threads)
STATIC(chunksize)STATIC(chunksize), creates a cyclic , creates a cyclic distribution of iterations distribution of iterations
![Page 34: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/34.jpg)
ComparisonComparison
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
THREAD 1 THREAD 2 THREAD 3 THREAD 4
THREAD 1 THREAD 2 THREAD 3 THREAD 4
STATICNo chunksize
STATICchunksize=1
![Page 35: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/35.jpg)
DYNAMICDYNAMIC scheduling scheduling DYNAMICDYNAMIC scheduling is a personal favourite scheduling is a personal favourite Specify usingSpecify using DYNAMIC(chunksize) DYNAMIC(chunksize) Simple implementation of master-worker Simple implementation of master-worker
type distribution of iterationstype distribution of iterations Master thread passes off values of Master thread passes off values of
iterations to the workers of size chunksizeiterations to the workers of size chunksize Not a silver bullet: if load balance is too Not a silver bullet: if load balance is too
severe (severe (i.e.i.e. one thread takes longer than one thread takes longer than the rest combined) an algorithm rewrite is the rest combined) an algorithm rewrite is necessarynecessary Also not good if you need a regular access Also not good if you need a regular access
pattern for data localitypattern for data locality
![Page 36: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/36.jpg)
Master-Worker ModelMaster-Worker Model
Master
THREAD 1
THREAD 2
THREAD 3
![Page 37: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/37.jpg)
Other ways to use Other ways to use OpenMPOpenMP
We’ve really only skimmed the surface of We’ve really only skimmed the surface of what you can dowhat you can do However, we have covered the important However, we have covered the important
detailsdetails OpenMP provides a different OpenMP provides a different
programming model to just using loopsprogramming model to just using loops It isn’t that much harder, but you need to It isn’t that much harder, but you need to
think slightly differentlythink slightly differently Check out Check out www.openmp.orgwww.openmp.org for more for more
detailsdetails
![Page 38: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/38.jpg)
Applying to algorithms used Applying to algorithms used in the coursein the course
What could we apply OpenMP to?What could we apply OpenMP to? Root finding algorithms are actually Root finding algorithms are actually
fundamentally serial!fundamentally serial! Global bracket finder: subdivide region and let each Global bracket finder: subdivide region and let each
CPU search in their allotted space in parallelCPU search in their allotted space in parallel LU decomposition can be parallelizedLU decomposition can be parallelized Numerical integration can be parallelizedNumerical integration can be parallelized ODE solvers are not usually good ODE solvers are not usually good
parallelization candidates, but it is problem parallelization candidates, but it is problem dependentdependent
MC methods usually (but not always) MC methods usually (but not always) parallelize wellparallelize well
![Page 39: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/39.jpg)
SummarySummary The main difficulty in loop level parallel The main difficulty in loop level parallel
programming is figuring out whether there are programming is figuring out whether there are data dependencies or race conditionsdata dependencies or race conditions
Remember that variables do not naturally Remember that variables do not naturally carry into a parallel loop, or for that matter out carry into a parallel loop, or for that matter out of one of one Use FIRSTPRIVATE and LASTPRIVATE when you Use FIRSTPRIVATE and LASTPRIVATE when you
need to do thisneed to do this SCHEDULE provides many optionsSCHEDULE provides many options
Use DYNAMIC when you have unknown amount of Use DYNAMIC when you have unknown amount of work in a loopwork in a loop
Use STATIC when you need a regular access Use STATIC when you need a regular access pattern to arraypattern to array
![Page 40: Computational Methods in Physics PHYS 3437](https://reader036.vdocuments.net/reader036/viewer/2022062517/56812d6c550346895d9281df/html5/thumbnails/40.jpg)
Next LectureNext Lecture
Introduction to visualizationIntroduction to visualization