advanced fortran - prace training portal: events · advanced fortran mikko ... definition to enable...

18
Advanced Fortran Mikko Byckling Sami Saarinen Oct 8 11, 2013 @ CSC IT Center for Science Ltd, Espoo

Upload: buicong

Post on 04-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Advanced Fortran

Mikko Byckling Sami Saarinen

Oct 8 – 11, 2013 @ CSC – IT Center for Science Ltd, Espoo

All material (C) 2013 by CSC – IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License, http://creativecommons.org/licenses/by-nc-sa/3.0/

Schedule (8 – 11 October 2013)

Tuesday

9.00- 9.45 Advanced Fortran intro

10.00-10.45 Useful new features

11.00-12.00 Exercises

12.00-13.00 Lunch break

13.00-13.45 Types & procedure ptrs

14.00-14.45 Exercises

15.00-16.00 Object Oriented Fortran

16.00 -17.00 Exercises

Wednesday

9.00- 9.45 Advanced OOF

10.00-10.45 Exercises

11.00-12.00 Interoperability with C

12.00-13.00 Lunch break

13.00-14.00 Exercises

14.00-14.45 Introduction to OpenMP

15.00-16.00 Exercises

9.00- 9.45 Thread synchronization

10.00-11.00 Exercises

11.00-12.00 Advanced OpenMP

12.00-13.00 Lunch break

13.00-13.45 Exercises

14.00-14.45 Introduction to CAF

15.00-16.00 Exercises

9.00- 9.45 More CAF features

10.00-11.00 Exercises

11.00-12.00 Advanced CAF

12.00-13.00 Lunch break

13.00-15.00 Exercises

15.00-15.30 Wrap-up

Thursday Friday

Web resources

CSC’s Fortran95/2003 Guide (in Finnish) for free

http://www.csc.fi/csc/julkaisut/oppaat

Fortran wiki: a resource hub for all aspects of Fortran programming

http://fortranwiki.org

About Fortran standard evolution over few decades

http://fortranwiki.org/fortran/show/Fortran+2008

http://fortranwiki.org/fortran/show/Fortran+2003

http://fortranwiki.org/fortran/show/Fortran+95

http://fortranwiki.org/fortran/show/Fortran+90

http://fortranwiki.org/fortran/show/FORTRAN+77

http://fortranwiki.org/fortran/show/FORTRAN+66

GNU Fortran online documents, version 4.8.1

http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gfortran

Cray documentation, where to search for latest Fortran features, too

http://docs.cray.com

Intel Fortran compilers

http://software.intel.com/en-us/fortran-compilers

G95 Fortran compiler and its CAF support

http://www.g95.org

http://www.g95.org/coarray.shtml

Fortran code examples

http://www.nag.co.uk/nagware/examples.asp

http://www.personal.psu.edu/jhm/f90/progref.html

Mistakes in Fortran 90 Programs That Might Surprise You http://www.cs.rpi.edu/~szymansk/OOF90/bugs.html

Useful Co-array Fortran (CAF) documentation

http://www2.hpcl.gwu.edu/pgas09/tutorials/caf_tut.pdf

http://www.co-array.org

ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1824.pdf‎

http://en.wikipedia.org/wiki/Coarray_Fortran

http://gcc.gnu.org/wiki/Coarray

ADVANCED FORTRAN EXERCISES

Advanced Fortran exercises

General information Stubs (or skeletons) of all source codes are available through

tar-file AdvFTN_stubs.tgz

To untar it, issue the following Unix-command:

tar zxvf AdvFTN_stubs.tgz

All exercises are under subdirectories of Exercises/stubs/

Each lecture has its own exercise sub-directory as shown in the following exercise slide pages

For every given exercise you are meant to edit/correct the stubs source code primarily by looking for two question-marks (??) in the source and providing a fix

Once finished, you ought to use Unix make command to build (and in some case even to run) the executables created

We use Cray compilers by default

Occasionally we may ask to try out Intel-compiler, too. Then you need to ”swap” to use Intel-compiler environment:

module swap PrgEnv-cray PrgEnv-intel

Back to Cray compiler via:

module swap PrgEnv-intel PrgEnv-cray

All executables are built for Cray login node, except CAF (Co-Array Fortran) examples. The CAF-executables need be run by use of aprun-command, for example using 4 images:

aprun –n4 ./your_caf_executable_name [arguments]

Solutions are provided after each exercise in separate tar-files, one per lecture/exercise

Advanced Fortran exercises ( Exercises/stubs/Introduction2AdvancedFortran )

Introduction to Advanced Fortran 1. Operator overloading (intstr.F90 and overload.F90) a) Implement additive (i.e. plus) operator ‘+’ to be able to sum up an integer number with a

character representation of a number and return an integer e.g. 1 + '2' would give 3 and '45' + 7 would produce 52.

b) Take a look at the assignment operator (‘=‘) implementation, too. This provides a handy way to assign integer valued character strings to integer upon initialization. Use of assignment operator also greatly simplifies implementation of the aforementioned additive operator.

Advanced Fortran exercises ( Exercises/stubs/Useful_features )

Useful new features 1. New ALLOCATABLE features plus trying out new edit

descriptors I0 and G0 (alloc.F90) a) Supply source code for ALLOCATABLE subroutine argument allocation and its initialization

b) Supply source code for function return handling whose value is of ALLOCATABLE type

c) Do we have to explicitly ALLOCATE space for a character string of length of LENCH ?

d) Use I0 and G0 edit descriptors where instructed

2. New POINTER features and contiguous testing (pointer.F90) a) Fix and run the program to figure out how the new POINTER features work and whether

CONTIGUOUS parameter has been correctly set

3. Use standardized operating system utilities (os.F90) a) Get all command line arguments into a string and print it

b) Get number of arguments, get them one-by-one (also command itself) and print

c) Find out the value of environment variable $HOME

d) Execute Unix-command 'echo $HOME' and check the exit status.

4. (*BONUS*) Experimenting with asynchronous I/O (async.F90)

a) Run with Cray compiler and figure out whether asynchronous I/O actually works

b) Repeat the same with Intel compiler

Advanced Fortran exercises ( Exercises/stubs/AdvancedFortran_TypesPointers )

Types and procedure pointers 1. Parameterization of a derived type (partype.F90) a) Create a type abstraction for a vector. The implementation should allow the

parameterization of the real KIND and LEN.

2. Abstract interfaces and procedure pointers (absif.F90) a) Create an abstract interface for function taking two vectors as an input and returning a

vector as well as a function taking two vectors as an input and returning a scalar.

b) Create actual implementations for functions computing the sum, elementwise product and dot product of two vectors.

c) Initialize two vectors v1=(1,2,3,4,5) and v2=(5,4,3,2,1) and compute v3=v1+v2, v3=v1.*v2 (element wise product) and v3=v1*v2 (dot product). Do the type constructors work as expected?

d) Create functions vecfun(fun,v1,v2) and scafun(fun,v1,v2) which take as their first argument the scalar or vector functions with the interface created “Types and procedure pointers”:2a). Is it possible to compute the results of “Types and procedure pointers”:2c) by using vecfun and scafun?

Advanced Fortran exercises ( Exercises/stubs/AdvancedFortran_Objects )

Objects 1. Type-bound procedures (typebound.F90 and

typebound_mod.F90) a) Add functions created in “Types and procedure pointers”: 2b) as type-bound procedures of

the given vector type.

b) Add operators for the type-bound procedures of “Objects “:1a).

2. Type extensions (extend.F90 and extend_mod.F90) a) Extend the vector type “Objects “:1) to have a separate imaginary part and make operators

for summing two imaginary vectors and imaginary vector to a real vector.

b) (*BONUS) Examine the output of the program. Try to make computation of the sum mathematically correct, i.e., v1+v2=v2+v1.

Advanced Fortran exercises ( Exercises/stubs/AdvancedFortran_AdvancedObjects )

Advanced objects 1. Abstract classes (abstracttype.F90 and

abstracttype_mod.F90) a) Construct an abstract vector class witch contains deferred type-bound procedures and

operators for sums, elementwise products and dot products of two vectors.

b) Create a double precision vector class to implement the abstract vector class.

2. Type component visibility and type creation (accesscontrol.F90 and accesscontrol_mod.F90)

a) Make type components of the vector class private. Create a constructor which wraps the type constructor and allows creation of vector types of given length and type.

b) Create a module generic with the same name as the vector type and map it to the type constructor created in “Advanced objects”:2a).

c) (*BONUS) Create formatted output method for the vector type to output its components. Unfortunately the current compilers lack support for derived type I/O, so one must refer to more old fashioned means for outputting the contents.

Advanced Fortran exercises ( Exercises/stubs/Interoperability )

Language interoperability 1. Call C-function from Fortran to calculate a dot product of

two vectors (callc.F90 and cfunc.c) a) Supply correct C-binding INTERFACE block for Fortran program

2. Access global C-data from Fortran (global.F90, globalmod.F90 and cdata.c)

a) Modify Fortran module file (globalmod.F90) to map C-variables correctly to Fortran representation

b) Repeat run with Intel compiler, too and realize that the C-struct may not be optimal from data alignment point of view. Fix the alignment in C-code make sure Fortran gets corrected, too

c) Fix the Fortran main program (global.F90) to get a correct C-to-F POINTER mapping

3. (*BONUS*) Implement mygetenv-function (getenv.F90 and asgn.F90)

a) This mygetenv-function should take a NUL-terminated Fortran string as input and return a Fortran string. Yet it should call the getenv-function from C-library directly. This can be accomplished by an “intelligent” use of assignment operator that maps TYPE(c_ptr) i.e. char * in C-language to Fortran character strings. You only need to provide INTERFACE definition to enable hassle-free getenv-calls from Fortran – the rest is already coded.

b) Repeat the same run with Intel compiler, too

Advanced Fortran exercises ( Exercises/stubs/OpenMP_Introduction )

Introduction to OpenMP 1. OpenMP parallel regions (hello.F90) a) Create a hello world program with OpenMP where each thread prints “Hello from

TID=<n>”, where <n> denotes the thread number. Make sure the program also compiles and executes correctly when used without OpenMP (make hello_noomp).

2. OpenMP work sharing (worksharing.F90) a) Modify the routines for computing the sum and element wise product of two vectors to

use OpenMP. Can you always expect to get the same numerical results?

Advanced Fortran exercises ( Exercises/stubs/OpenMP_ThreadSynchronization )

OpenMP thread synchronization 1. Reduction (reduction.F90) a) Modify the routine for computing the dot product of two vectors to use OpenMP. Can you

always expect to get the same numerical results?

2. Execution control (execctrl.F90) a) Initialize the data sets for the given computation with both MASTER (mdata) and SINGLE

(sdata) constructs. Experiment by running the program multiple times with a few threads and see if all threads compute exactly the same numerical results.

Advanced Fortran exercises ( Exercises/stubs/OpenMP_Advanced )

Advanced OpenMP 1. Workshare (workshare.F90) a) Given an n-by-n matrix A, implemement a (columnwise) smoothing operation to output a

matrix B with entries B(i,j)=A(i,j)/2+A(i+1,j)/2 for i=1, j=1,…,n. B(i,j)=A(i-1,j)/4+A(i,j)/2+A(i+1,j)/4 for i=2,…,n-1, j=1,…,n. B(i,j)=A(i-1,j)/2+A(i,j)/2 for i=n, j=1,…,n. When implementing the computation, use Fortran array notation and OpenMP workshare construct.

2. Tasks (fibonacci.F90) a) Add tasking constructs to the given code to compute Fibonacci numbers (f_0=0, f_1=1,

f_{n+1}=f_n+f_{n-1}, n>=2) recursively.

b) (*BONUS) Reduce tasking overhead by using conditional tasks or using sequential generation of results for small numbers of n.

Advanced Fortran exercises ( Exercises/stubs/CAF_Introduction )

Introduction to CAF 1. Hello World (hello.F90) a) Fix the source code to print local image number and total number of images from each

image

b) Run with different number of images

2. Reading standard input data (readinp.F90) a) A parameter is being read in by the master image and then distributed to other images.

Make the skeleton program working.

3. Find out the global maximum value (maxval.F90) a) The program shows how to determine the global maximum of the given CAF-distributed

vector (rvec) using rather inefficient (“brute force”) method. Run the program with different number of images.

b) (*BONUS*) Modify program to make calculation more efficient by means of removing “all-to-all” communication over co-array “rvec” (references to rvec[jroc]). Hint: you may have to turn the local_max variable into a co-array local_max[*]

Advanced Fortran exercises ( Exercises/stubs/CAF_More )

More CAF features 1. Multidimensional co-arrays (multidim.F90) a) See how co-size, co-shape and co-rank is being computed for co-array scalar variable

coSCALAR and provide similar coding for array coARR, which is both a Fortran-array (locally) and a co-array globally

b) Run display image_index for coARR at [p,q,3] when running with 4 and 16 images. Why with 4 images the output is zero (0) ?

c) For each image display this_image for variables coSCALAR and coARR

2. Dynamic memory allocation (dynamic.F90) a) Given number of images, the program determines values (p,q) so that the most squared

2D-decomposition (p times q equals to number of images) can be used. Using optimal values of (p,q), allocate a co-scalar with co-rank of 3 using pxq partitioning. This would mean that the 3rd co-dimension in evidently would equal to one.

b) Allocate different amount of memory for each image, whilst using co[*] %data structure

3. Pass a co-array to a subroutine (callsub.F90) a) Modify program Given a 2x2 co-array vector. Determine again optimal (p,q) and pass

vector to a subroutine, where is declared as [p,q,*] and will its (1,1) local location with value of this_image

b) Print out each images’ local value of (1,1) out from the last image. Is any synchronization needed ?

Advanced Fortran exercises ( Exercises/stubs/CAF_Advanced )

Advanced CAF 1. 2D-Laplace iterative equation solver (jacobi.F90) a) A Laplace equation over 2D rectangular area is discretized using Jacobi iterative solver. The

area is split equally along the east-west (E-W) direction across CAF-images. During each iteration local points are updated and boundaries along E-W direction are exchanged. And convergence norm is calculated. Provide code for co-array allocation with error checking.

b) Provide coding for exchange_boundaries routine. Remember appropriate synchronization (there maybe many options available).

c) (*BONUS*) Run with variable number of images. Do you get the same results ?

Y

X

T(3,1,this)[1] T(2,3,this)[2]

E-W halo areas

Fixed boundaries (T = 0)