advanced fortran - prace training portal: events · advanced fortran mikko ... definition to enable...
TRANSCRIPT
Advanced Fortran
Mikko Byckling Sami Saarinen
Oct 8 – 11, 2013 @ CSC – IT Center for Science Ltd, Espoo
All material (C) 2013 by CSC – IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License, http://creativecommons.org/licenses/by-nc-sa/3.0/
Schedule (8 – 11 October 2013)
Tuesday
9.00- 9.45 Advanced Fortran intro
10.00-10.45 Useful new features
11.00-12.00 Exercises
12.00-13.00 Lunch break
13.00-13.45 Types & procedure ptrs
14.00-14.45 Exercises
15.00-16.00 Object Oriented Fortran
16.00 -17.00 Exercises
Wednesday
9.00- 9.45 Advanced OOF
10.00-10.45 Exercises
11.00-12.00 Interoperability with C
12.00-13.00 Lunch break
13.00-14.00 Exercises
14.00-14.45 Introduction to OpenMP
15.00-16.00 Exercises
9.00- 9.45 Thread synchronization
10.00-11.00 Exercises
11.00-12.00 Advanced OpenMP
12.00-13.00 Lunch break
13.00-13.45 Exercises
14.00-14.45 Introduction to CAF
15.00-16.00 Exercises
9.00- 9.45 More CAF features
10.00-11.00 Exercises
11.00-12.00 Advanced CAF
12.00-13.00 Lunch break
13.00-15.00 Exercises
15.00-15.30 Wrap-up
Thursday Friday
Web resources
CSC’s Fortran95/2003 Guide (in Finnish) for free
http://www.csc.fi/csc/julkaisut/oppaat
Fortran wiki: a resource hub for all aspects of Fortran programming
http://fortranwiki.org
About Fortran standard evolution over few decades
http://fortranwiki.org/fortran/show/Fortran+2008
http://fortranwiki.org/fortran/show/Fortran+2003
http://fortranwiki.org/fortran/show/Fortran+95
http://fortranwiki.org/fortran/show/Fortran+90
http://fortranwiki.org/fortran/show/FORTRAN+77
http://fortranwiki.org/fortran/show/FORTRAN+66
GNU Fortran online documents, version 4.8.1
http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gfortran
Cray documentation, where to search for latest Fortran features, too
http://docs.cray.com
Intel Fortran compilers
http://software.intel.com/en-us/fortran-compilers
G95 Fortran compiler and its CAF support
http://www.g95.org
http://www.g95.org/coarray.shtml
Fortran code examples
http://www.nag.co.uk/nagware/examples.asp
http://www.personal.psu.edu/jhm/f90/progref.html
Mistakes in Fortran 90 Programs That Might Surprise You http://www.cs.rpi.edu/~szymansk/OOF90/bugs.html
Useful Co-array Fortran (CAF) documentation
http://www2.hpcl.gwu.edu/pgas09/tutorials/caf_tut.pdf
http://www.co-array.org
ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1824.pdf
http://en.wikipedia.org/wiki/Coarray_Fortran
http://gcc.gnu.org/wiki/Coarray
Advanced Fortran exercises
General information Stubs (or skeletons) of all source codes are available through
tar-file AdvFTN_stubs.tgz
To untar it, issue the following Unix-command:
tar zxvf AdvFTN_stubs.tgz
All exercises are under subdirectories of Exercises/stubs/
Each lecture has its own exercise sub-directory as shown in the following exercise slide pages
For every given exercise you are meant to edit/correct the stubs source code primarily by looking for two question-marks (??) in the source and providing a fix
Once finished, you ought to use Unix make command to build (and in some case even to run) the executables created
We use Cray compilers by default
Occasionally we may ask to try out Intel-compiler, too. Then you need to ”swap” to use Intel-compiler environment:
module swap PrgEnv-cray PrgEnv-intel
Back to Cray compiler via:
module swap PrgEnv-intel PrgEnv-cray
All executables are built for Cray login node, except CAF (Co-Array Fortran) examples. The CAF-executables need be run by use of aprun-command, for example using 4 images:
aprun –n4 ./your_caf_executable_name [arguments]
Solutions are provided after each exercise in separate tar-files, one per lecture/exercise
Advanced Fortran exercises ( Exercises/stubs/Introduction2AdvancedFortran )
Introduction to Advanced Fortran 1. Operator overloading (intstr.F90 and overload.F90) a) Implement additive (i.e. plus) operator ‘+’ to be able to sum up an integer number with a
character representation of a number and return an integer e.g. 1 + '2' would give 3 and '45' + 7 would produce 52.
b) Take a look at the assignment operator (‘=‘) implementation, too. This provides a handy way to assign integer valued character strings to integer upon initialization. Use of assignment operator also greatly simplifies implementation of the aforementioned additive operator.
Advanced Fortran exercises ( Exercises/stubs/Useful_features )
Useful new features 1. New ALLOCATABLE features plus trying out new edit
descriptors I0 and G0 (alloc.F90) a) Supply source code for ALLOCATABLE subroutine argument allocation and its initialization
b) Supply source code for function return handling whose value is of ALLOCATABLE type
c) Do we have to explicitly ALLOCATE space for a character string of length of LENCH ?
d) Use I0 and G0 edit descriptors where instructed
2. New POINTER features and contiguous testing (pointer.F90) a) Fix and run the program to figure out how the new POINTER features work and whether
CONTIGUOUS parameter has been correctly set
3. Use standardized operating system utilities (os.F90) a) Get all command line arguments into a string and print it
b) Get number of arguments, get them one-by-one (also command itself) and print
c) Find out the value of environment variable $HOME
d) Execute Unix-command 'echo $HOME' and check the exit status.
4. (*BONUS*) Experimenting with asynchronous I/O (async.F90)
a) Run with Cray compiler and figure out whether asynchronous I/O actually works
b) Repeat the same with Intel compiler
Advanced Fortran exercises ( Exercises/stubs/AdvancedFortran_TypesPointers )
Types and procedure pointers 1. Parameterization of a derived type (partype.F90) a) Create a type abstraction for a vector. The implementation should allow the
parameterization of the real KIND and LEN.
2. Abstract interfaces and procedure pointers (absif.F90) a) Create an abstract interface for function taking two vectors as an input and returning a
vector as well as a function taking two vectors as an input and returning a scalar.
b) Create actual implementations for functions computing the sum, elementwise product and dot product of two vectors.
c) Initialize two vectors v1=(1,2,3,4,5) and v2=(5,4,3,2,1) and compute v3=v1+v2, v3=v1.*v2 (element wise product) and v3=v1*v2 (dot product). Do the type constructors work as expected?
d) Create functions vecfun(fun,v1,v2) and scafun(fun,v1,v2) which take as their first argument the scalar or vector functions with the interface created “Types and procedure pointers”:2a). Is it possible to compute the results of “Types and procedure pointers”:2c) by using vecfun and scafun?
Advanced Fortran exercises ( Exercises/stubs/AdvancedFortran_Objects )
Objects 1. Type-bound procedures (typebound.F90 and
typebound_mod.F90) a) Add functions created in “Types and procedure pointers”: 2b) as type-bound procedures of
the given vector type.
b) Add operators for the type-bound procedures of “Objects “:1a).
2. Type extensions (extend.F90 and extend_mod.F90) a) Extend the vector type “Objects “:1) to have a separate imaginary part and make operators
for summing two imaginary vectors and imaginary vector to a real vector.
b) (*BONUS) Examine the output of the program. Try to make computation of the sum mathematically correct, i.e., v1+v2=v2+v1.
Advanced Fortran exercises ( Exercises/stubs/AdvancedFortran_AdvancedObjects )
Advanced objects 1. Abstract classes (abstracttype.F90 and
abstracttype_mod.F90) a) Construct an abstract vector class witch contains deferred type-bound procedures and
operators for sums, elementwise products and dot products of two vectors.
b) Create a double precision vector class to implement the abstract vector class.
2. Type component visibility and type creation (accesscontrol.F90 and accesscontrol_mod.F90)
a) Make type components of the vector class private. Create a constructor which wraps the type constructor and allows creation of vector types of given length and type.
b) Create a module generic with the same name as the vector type and map it to the type constructor created in “Advanced objects”:2a).
c) (*BONUS) Create formatted output method for the vector type to output its components. Unfortunately the current compilers lack support for derived type I/O, so one must refer to more old fashioned means for outputting the contents.
Advanced Fortran exercises ( Exercises/stubs/Interoperability )
Language interoperability 1. Call C-function from Fortran to calculate a dot product of
two vectors (callc.F90 and cfunc.c) a) Supply correct C-binding INTERFACE block for Fortran program
2. Access global C-data from Fortran (global.F90, globalmod.F90 and cdata.c)
a) Modify Fortran module file (globalmod.F90) to map C-variables correctly to Fortran representation
b) Repeat run with Intel compiler, too and realize that the C-struct may not be optimal from data alignment point of view. Fix the alignment in C-code make sure Fortran gets corrected, too
c) Fix the Fortran main program (global.F90) to get a correct C-to-F POINTER mapping
3. (*BONUS*) Implement mygetenv-function (getenv.F90 and asgn.F90)
a) This mygetenv-function should take a NUL-terminated Fortran string as input and return a Fortran string. Yet it should call the getenv-function from C-library directly. This can be accomplished by an “intelligent” use of assignment operator that maps TYPE(c_ptr) i.e. char * in C-language to Fortran character strings. You only need to provide INTERFACE definition to enable hassle-free getenv-calls from Fortran – the rest is already coded.
b) Repeat the same run with Intel compiler, too
Advanced Fortran exercises ( Exercises/stubs/OpenMP_Introduction )
Introduction to OpenMP 1. OpenMP parallel regions (hello.F90) a) Create a hello world program with OpenMP where each thread prints “Hello from
TID=<n>”, where <n> denotes the thread number. Make sure the program also compiles and executes correctly when used without OpenMP (make hello_noomp).
2. OpenMP work sharing (worksharing.F90) a) Modify the routines for computing the sum and element wise product of two vectors to
use OpenMP. Can you always expect to get the same numerical results?
Advanced Fortran exercises ( Exercises/stubs/OpenMP_ThreadSynchronization )
OpenMP thread synchronization 1. Reduction (reduction.F90) a) Modify the routine for computing the dot product of two vectors to use OpenMP. Can you
always expect to get the same numerical results?
2. Execution control (execctrl.F90) a) Initialize the data sets for the given computation with both MASTER (mdata) and SINGLE
(sdata) constructs. Experiment by running the program multiple times with a few threads and see if all threads compute exactly the same numerical results.
Advanced Fortran exercises ( Exercises/stubs/OpenMP_Advanced )
Advanced OpenMP 1. Workshare (workshare.F90) a) Given an n-by-n matrix A, implemement a (columnwise) smoothing operation to output a
matrix B with entries B(i,j)=A(i,j)/2+A(i+1,j)/2 for i=1, j=1,…,n. B(i,j)=A(i-1,j)/4+A(i,j)/2+A(i+1,j)/4 for i=2,…,n-1, j=1,…,n. B(i,j)=A(i-1,j)/2+A(i,j)/2 for i=n, j=1,…,n. When implementing the computation, use Fortran array notation and OpenMP workshare construct.
2. Tasks (fibonacci.F90) a) Add tasking constructs to the given code to compute Fibonacci numbers (f_0=0, f_1=1,
f_{n+1}=f_n+f_{n-1}, n>=2) recursively.
b) (*BONUS) Reduce tasking overhead by using conditional tasks or using sequential generation of results for small numbers of n.
Advanced Fortran exercises ( Exercises/stubs/CAF_Introduction )
Introduction to CAF 1. Hello World (hello.F90) a) Fix the source code to print local image number and total number of images from each
image
b) Run with different number of images
2. Reading standard input data (readinp.F90) a) A parameter is being read in by the master image and then distributed to other images.
Make the skeleton program working.
3. Find out the global maximum value (maxval.F90) a) The program shows how to determine the global maximum of the given CAF-distributed
vector (rvec) using rather inefficient (“brute force”) method. Run the program with different number of images.
b) (*BONUS*) Modify program to make calculation more efficient by means of removing “all-to-all” communication over co-array “rvec” (references to rvec[jroc]). Hint: you may have to turn the local_max variable into a co-array local_max[*]
Advanced Fortran exercises ( Exercises/stubs/CAF_More )
More CAF features 1. Multidimensional co-arrays (multidim.F90) a) See how co-size, co-shape and co-rank is being computed for co-array scalar variable
coSCALAR and provide similar coding for array coARR, which is both a Fortran-array (locally) and a co-array globally
b) Run display image_index for coARR at [p,q,3] when running with 4 and 16 images. Why with 4 images the output is zero (0) ?
c) For each image display this_image for variables coSCALAR and coARR
2. Dynamic memory allocation (dynamic.F90) a) Given number of images, the program determines values (p,q) so that the most squared
2D-decomposition (p times q equals to number of images) can be used. Using optimal values of (p,q), allocate a co-scalar with co-rank of 3 using pxq partitioning. This would mean that the 3rd co-dimension in evidently would equal to one.
b) Allocate different amount of memory for each image, whilst using co[*] %data structure
3. Pass a co-array to a subroutine (callsub.F90) a) Modify program Given a 2x2 co-array vector. Determine again optimal (p,q) and pass
vector to a subroutine, where is declared as [p,q,*] and will its (1,1) local location with value of this_image
b) Print out each images’ local value of (1,1) out from the last image. Is any synchronization needed ?
Advanced Fortran exercises ( Exercises/stubs/CAF_Advanced )
Advanced CAF 1. 2D-Laplace iterative equation solver (jacobi.F90) a) A Laplace equation over 2D rectangular area is discretized using Jacobi iterative solver. The
area is split equally along the east-west (E-W) direction across CAF-images. During each iteration local points are updated and boundaries along E-W direction are exchanged. And convergence norm is calculated. Provide code for co-array allocation with error checking.
b) Provide coding for exchange_boundaries routine. Remember appropriate synchronization (there maybe many options available).
c) (*BONUS*) Run with variable number of images. Do you get the same results ?
…
Y
X
T(3,1,this)[1] T(2,3,this)[2]
E-W halo areas
Fixed boundaries (T = 0)