[email protected] toward parallel space radiation analysis dr. liwen shih, thomas k. gederberg,...

35
TG08 TG08 [email protected] [email protected] 1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan Strausser, Travis Gilbert, Victor Shum, Romeo Chua University of Houston Clear Lake

Upload: adam-clark

Post on 25-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 11

Toward Parallel Space Radiation Analysis

Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni,

Ahmed Khan, Sergio J. Larrondo, Susan Strausser,

Travis Gilbert, Victor Shum, Romeo ChuaUniversity of Houston Clear Lake

Page 2: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 22

Runtime Profile Of HZETRN Functions

texp, 2.91

powf.J, 1.35

prpgt_, 0.93

od_, 0.73

cvtas_s_to_a, 0.73

PHI (Interpolation Function Most Time

Spent Here at: 34.5% of total

runtime)

13,220,184 calls made to this function over

program run!

Remaining Functions, 9.03

expf.J, 8.72%

iuni 4.36

prpli_, 1.97

logf.J, 30.43%

anu, 4.26

phi

logf.J

expf.J

iuni_

anu

texp

prpli_

powf.J

prpgt_

od_

cvtas_s_to_a

Remaining Functions

This project continues Space Radiation Research work preformed last year by Dr. Liwen Shih’s students to investigate HZETRN code optimization options.

This semester we will analyze HZETRN code using standard static analysis tools and runtime analysis tools. In addition we will examine code parallelization options for the most called numerical method in the source code: the PHI function.

Page 3: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 33

What is Space Radiation?What is Space Radiation?

Two major sources galactic cosmic rays (GCR) solar energetic particles (SEP).

GCR are ever-present and more energetic, thus they are able to penetrate much thicker materials than SEP.

In order to evaluate the space radiation risk and design the spacecraft and habitat for better radiation protection, space radiation transport codes, which depends on the input physics of nuclear interactions, have been developed

Page 4: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 44

Space Radiation and the Space Radiation and the EarthEarth

Earth protected from Space Radiation

Animation Sources: Rice University, Connections Program.

This image shows how the Earth's magnetic field causes electrons to drift one way about the Earth. Protons drift the opposite direction.

original clips provided courtesy of Professor Patricia Reiff,

Rice University, Connections Program

Page 5: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 55

What about Galactic Cosmic Radiation What about Galactic Cosmic Radiation (GCR)?(GCR)?

A typical high energy particle of A typical high energy particle of radiation found in the space radiation found in the space environment is ionized itself and environment is ionized itself and as it passes through material as it passes through material such as human tissue it disrupts such as human tissue it disrupts the electronic clouds of the the electronic clouds of the constituent molecules and leaves constituent molecules and leaves a path of ionization in its wake. a path of ionization in its wake. These particles are either singly These particles are either singly charged protons or more highly charged protons or more highly charged nuclei called "HZE" charged nuclei called "HZE" particles.particles.

Page 6: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 66

HZETRN - HZETRN - Space Radiation Space Radiation Nuclear Transport CodeNuclear Transport Code

The three included source code files are:

1-NUCFRAG.FOR for generating nuclear absorption and reaction cross sections

2-GEOMAG.FOR for defining the GCR

transmission coefficient cutoff effects within the magnetosphere.

3-HZETRN.FOR for propagating the user defined GCR environments through two

layers of user supplied materials. The current version is setup to propagate through aluminum, tissue (H2O), CH2 and LH2.

HZETRN : High Charge and HZETRN : High Charge and Energy Nuclear Transport Energy Nuclear Transport

CodeCode

FORTRAN-77 FORTRAN-77 Written: 1992Written: 1992

Environment: VAX mainframeEnvironment: VAX mainframe

Code Metrics:Code Metrics:

Files:Files: 3 3Lines:Lines:

96659665Code Lines:Code Lines:

68036803Comment Lines:Comment Lines:

28592859Declarative Statements: 780Declarative Statements: 780Executable Statements: 6563Executable Statements: 6563Ratio Comment/Code: 0.42Ratio Comment/Code: 0.42

Page 7: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 77

HZETRN Numerical MethodHZETRN Numerical Method

Page 8: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 88

HZETRN Calculates:HZETRN Calculates:

Radiation Fluence of HZE particles:Radiation Fluence of HZE particles:time-integrated flux of HZE particles per unit area.time-integrated flux of HZE particles per unit area.

Energy absorbed per gram:Energy absorbed per gram:first measuring energy amount left behind by first measuring energy amount left behind by

radiation in question and, then, amount and type radiation in question and, then, amount and type of material.of material.

Dose Equivalent:Dose Equivalent:A unit of dose equivalent A unit of dose equivalent amount of any type of amount of any type of

radiation absorbed in a biological tissue as a radiation absorbed in a biological tissue as a standardized valuestandardized value

Page 9: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 99

HZETRN AlgorithmHZETRN Algorithm

Page 10: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1010

HZETRN used for Mars HZETRN used for Mars Mission Mission

Thus, protection from the hazards of severe space radiation is of paramount importance for the new vision. There is an overwhelming emphasis on the reliability issues for the mission and the habitat. Accurate risk assessments critically depend on the accuracy of the input information about the interaction of ions with materials, electronics and tissues.

NASA has a new vision for space exploration in the 21st Century encompassing a broad range of human and robotic missions including missions to Moon, Mars and beyond. As a result, there is a focus on long duration space missions. NASA, as much as ever, is committed to the safety of the missions and the crew. Exposure from the hazards of severe space radiation in deep space long duration missions is ‘the show stopper.’

Page 11: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1111

Martian Radiation Climate Martian Radiation Climate Modeling Using HZETRN CodeModeling Using HZETRN Code

Calculations of the skin dose equivalent for astronauts on the surface of Mars near solar minimum.

The variation in the dose with respect to altitude is shown.

Higher altitudes (such as Olympus Mons) offer less shielding.

Mars Radiation Environment (Source Wilson et al: http://marie.jsc.nasa.gov)

Page 12: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1212

HZETRN Model vs. Actual Mars Radiation HZETRN Model vs. Actual Mars Radiation Climate HZETRN Climate HZETRN underestimatesunderestimates!!

Graph Source: Aliena Spazio European Space Agency Report 2004

Dose rate measured byMARIE spacecraft in the transit period

from April 2001 to August 2001 compared with HZETRN Calculated Doses

Code calculationsSpike in May due to SPEDifferences between theobserved (red) andpredicted (black) dosesvary from factor 1 to 3

Partly Because of Code Partly Because of Code Inefficiency Dosage Inefficiency Dosage Data is Data is underestimated underestimated

Page 13: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1313

Project Goal: Project Goal: SpeedupSpeedup of Runtime via Analysis and of Runtime via Analysis and modification of HZETRN Code numerical algorithmmodification of HZETRN Code numerical algorithm

Runtime Profile Of HZETRN Functions

texp, 2.91

powf.J, 1.35

prpgt_, 0.93

od_, 0.73

cvtas_s_to_a, 0.73

PHI (Interpolation Function Most Time

Spent Here at: 34.5% of total

runtime)

13,220,184 calls made to this function over

program run!

Remaining Functions, 9.03

expf.J, 8.72%

iuni 4.36

prpli_, 1.97

logf.J, 30.43%

anu, 4.26

phi

logf.J

expf.J

iuni_

anu

texp

prpli_

powf.J

prpgt_

od_

cvtas_s_to_a

Remaining Functions

The major Space Radiation Code Bottleneck lies inside the function call to the PHI interpolation function

Page 14: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1414

Code Optimization OptionsCode Optimization Options4028 C ************************************************************** 4028 C ************************************************************** 4029 C 4029 C 4030 FUNCTION PHI(R0,N,R,P,X)4030 FUNCTION PHI(R0,N,R,P,X) 4031 C4031 C 4032 C FUNCTION PHI INTERPOLATES IN P(N) ARRAY DEFINED OVER R(N) 4032 C FUNCTION PHI INTERPOLATES IN P(N) ARRAY DEFINED OVER R(N)

ARRAY ARRAY 4033 C ASSUMES P IS LIKE A POWER OF R OVER SUBINTERVALS4033 C ASSUMES P IS LIKE A POWER OF R OVER SUBINTERVALS 4034 C 4034 C 4035 DIMENSION R(N),P(N)4035 DIMENSION R(N),P(N) 4036 C4036 C 4037 SAVE4037 SAVE 4038 C4038 C 4039 XT=X4039 XT=X 4040 PHI=P(1)4040 PHI=P(1) 4041 INC=((R(2)-R(1))/ABS(R(2)-R(1)))*1.014041 INC=((R(2)-R(1))/ABS(R(2)-R(1)))*1.01 4042 IF(X.LE.R(1).AND.R(1).LT.R(2))RETURN4042 IF(X.LE.R(1).AND.R(1).LT.R(2))RETURN 4043 C4043 C 4044 DO 1 I=3,N-14044 DO 1 I=3,N-1 4045 IL=I4045 IL=I 4046 IF(XT*INC.LT.R(I)*INC)GO TO 24046 IF(XT*INC.LT.R(I)*INC)GO TO 2 4047 1 CONTINUE4047 1 CONTINUE 4048 C4048 C 4049 IL=N-14049 IL=N-1 4050 2 CONTINUE4050 2 CONTINUE 4051 PHI=0.4051 PHI=0.

1. Fix Inefficient code

2. Fix/Remove unnecessary function calls (TEXP) SAVE, and dummy arguments

3. Use optimized ALOG function

4. Use Lookup Table instead

5. Investigate Parallelization Of Interpolation Statements

Link to HZETRN

Page 15: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1515

Code Optimization Code Optimization Improve Code Structure

USE FASTER ALOG function (LOG)

Remove extraneous Function Calls

Page 16: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1616

Steps toward a Steps toward a fasterfaster HZETRNHZETRN

Step Purpose Result

1. Review Algorithm Understand underlying numerical algorithm

HZETRN algorithm is complex – Needs further review –overall functions of

code are understood

2. Analyze Source Code and Data files

Understand code structure and function Review of Code and data files reveals that much of the code is inefficient, with redundant elements and archaic

structure Data files contain sparse matrices amenable to performance improvement

3. Portability Study Attempt to port HZETRN code To various HPC platforms and compilers

Portability study revealed problems with code and additional requirements for optimization

4. Static Analysis Develop understanding of program structure –Document code for optimization and report

We generated a detailed HTML report documenting HZETRN source code functions and structure of subroutine calls

5. Runtime Analysis Target runtime bottlenecks and determine most called functions/subroutines

Revealed that the PHI interpolation function is the major bottleneck function

\The natural logarithm intrinsic function Is also a performance issue

6. Serial Optimization of Code

Starting with the PHI function We removed extraneous function calls,

cleaned up ‘messy code’

Resulted in Runtime Performance improvement

(initially a 10% overall increase)

Page 17: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1717

Parallel Space Radiation Parallel Space Radiation

Analysis Analysis The goal of project was to speed up The goal of project was to speed up

the execution of the HZETRN code the execution of the HZETRN code

using parallel processing.using parallel processing.

The Message Passing Interface (MPI) The Message Passing Interface (MPI)

standard library was to be used to standard library was to be used to

perform the parallel processing across perform the parallel processing across

a cluster with distributed memory.a cluster with distributed memory.

Page 18: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1818

Computing Resources UsedComputing Resources Used Itanium 2 cluster (Itanium 2 cluster (AtlantisAtlantis) - ) - Texas Learning & Computation Texas Learning & Computation

CenterCenter (TLC (TLC22) at the University of Houston.) at the University of Houston.

AtlantisAtlantis is a cluster of 152 dual Itanium2 (1.3 GHz) compute is a cluster of 152 dual Itanium2 (1.3 GHz) compute

nodes networked via a Myrinet 2000 interconnect. nodes networked via a Myrinet 2000 interconnect. AtlantisAtlantis

is running RedHat Linux version 5.1.is running RedHat Linux version 5.1.

The Intel Fortran compiler (version 10.0) and OpenMPI (an The Intel Fortran compiler (version 10.0) and OpenMPI (an

Open Source MPI-2 implementation) of MPI is being used.Open Source MPI-2 implementation) of MPI is being used.

In addition, a home PC running Linux (Ubuntu 7.10) with the In addition, a home PC running Linux (Ubuntu 7.10) with the

Sun Studio 12 Fortran 90 compiler and MPICH2 was used. Sun Studio 12 Fortran 90 compiler and MPICH2 was used.

TeraGrid has just started been usedTeraGrid has just started been used

Page 19: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 1919

PHI Routine (Lagrangian PHI Routine (Lagrangian

Interploation)Interploation) Figure showing HZETRN runtime profileFigure showing HZETRN runtime profile

Most time is spent by function PHI Most time is spent by function PHI

- 3- 3rdrd order Lagrangian Interpolation. order Lagrangian Interpolation.

PHI function is heavily called by the PHI function is heavily called by the

propagation and integration routines propagation and integration routines

-called 229,380 times at each depth -called 229,380 times at each depth

typically.typically.

Early focus - optimizing PHI routine.Early focus - optimizing PHI routine.

The PHI routine takes the The PHI routine takes the natural lognatural log of of

the input ordinate and abscissas prior the input ordinate and abscissas prior

to peforming the Lagrangian to peforming the Lagrangian

interpolation and returns the interpolation and returns the

exponentialexponential of the interpolated of the interpolated

ordinate.ordinate.

(Source: Shih, Larrondo, et al, HIgh-Performance

Martian Space Radiation Mapping,

NASA/UHCL/UH-ISSO, pp. 121-122)

Runtime Profile Of HZETRN Functions

texp, 2.91

powf.J, 1.35

prpgt_, 0.93

od_, 0.73

cvtas_s_to_a, 0.73

PHI (Interpolation Function Most Time

Spent Here at: 34.5% of total

runtime)

13,220,184 calls made to this function over

program run!

Remaining Functions, 9.03

expf.J, 8.72%

iuni 4.36

prpli_, 1.97

logf.J, 30.43%

anu, 4.26

phi

logf.J

expf.J

iuni_

anu

texp

prpli_

powf.J

prpgt_

od_

cvtas_s_to_a

Remaining Functions

Removing the calls to the natural log and exponential functions resulted in a 21%

(Atlantis) to 45% (home) speedup, but had negative impact on numerical results (see

next page) since the the functions being interpolated are logarithmic.

Page 20: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2020

PHI Routine - PHI Routine - Needs LOG/TEXPNeeds LOG/TEXPSignificant different comparing results with and without calls to LOG/TEXP

Page 21: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2121

PHI Routine OptimizationPHI Routine Optimization Bottleneck PHI routine being Bottleneck PHI routine being called so heavilycalled so heavily, message , message

passing overhead to parallelize would be passing overhead to parallelize would be prohibitiveprohibitive..

Simple Simple code optimizationscode optimizations of PHI routine resulted in: of PHI routine resulted in:

– 11.4 % speedup on home PC running Linux compiled 11.4 % speedup on home PC running Linux compiled

using the Sun Studio 12 Fortran compiler.using the Sun Studio 12 Fortran compiler.

– 3.85% speedup on an Atlantis node using the Intel 3.85% speedup on an Atlantis node using the Intel

Fortran compiler.Fortran compiler.

– Reduced speedup on Atlantis may be that the Reduced speedup on Atlantis may be that the Intel Intel

compilercompiler was already generating more optimized was already generating more optimized

code.code.

Page 22: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2222

PHI Routine FPGA PrototypePHI Routine FPGA Prototype

Implementing bottleneck Implementing bottleneck

routinesroutines: PHI routine, and/or : PHI routine, and/or

logarithm/exponential logarithm/exponential

routines routines in an in an FPGAFPGA could could

result in a significant result in a significant

speedupspeedup. .

A reduced precision floating-A reduced precision floating-

point FPGA prototype was point FPGA prototype was

developed for an estimated developed for an estimated

~325 times faster PHI ~325 times faster PHI

computation in hardwarecomputation in hardware..

Page 23: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2323

HZETRN Main Program FlowHZETRN Main Program FlowBasic flow of HZETRNBasic flow of HZETRN::

– Step 1: Call MATTER to obtain the material property (density, Step 1: Call MATTER to obtain the material property (density,

atomic weight and atomic number of each element) of the shield.atomic weight and atomic number of each element) of the shield.

– Step 2: Generate the energy grid.Step 2: Generate the energy grid.

– Step 3: Dosemetry and propagation in the shield materialStep 3: Dosemetry and propagation in the shield material

Call DMETRIC to compute dosemetic quantities at current Call DMETRIC to compute dosemetic quantities at current

depth.depth.

Call PRPGT to propagate the GCR's to the next depthCall PRPGT to propagate the GCR's to the next depth

Repeat step 3 until target material is reachedRepeat step 3 until target material is reached

– Step 4: Dosemetry and propagation in the target materialStep 4: Dosemetry and propagation in the target material

Call DMETRIC to compute dosemetric quantities at current Call DMETRIC to compute dosemetric quantities at current

depth.depth.

Call PRPGT to propagate the GCR's to the next depthCall PRPGT to propagate the GCR's to the next depth

Repeat step 4 until required depth is reached.Repeat step 4 until required depth is reached.

Page 24: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2424

DMETRIC RoutineDMETRIC Routine The suboutine DMETRIC is called by the main program at The suboutine DMETRIC is called by the main program at

each user specified depth in the shield and target to compute each user specified depth in the shield and target to compute

dosimetric quantities.dosimetric quantities.

Their are 6 main Their are 6 main do-loopsdo-loops in the routine. Approximately 60% in the routine. Approximately 60%

of DMETRICs processing time is spent in loop 2 and 39% of of DMETRICs processing time is spent in loop 2 and 39% of

DMETRICs processing time is spent in loop 5.DMETRICs processing time is spent in loop 5.

To check whether the above loop could be done in parallel, To check whether the above loop could be done in parallel,

the the order of the looporder of the loop was reversedwas reversed to test for data to test for data

dependency. dependency.

The results were identical The results were identical there was there was no data dependency no data dependency

between the dosemetric calculations for each isotopebetween the dosemetric calculations for each isotope. .

Page 25: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2525

DMETRIC RoutineDMETRIC Routine - Dependent? - Dependent? To determine if loop 5 is parallelizable, To determine if loop 5 is parallelizable, the outer the outer

loop was firstloop was first changed to decrement from changed to decrement from IIII to 1 to 1

rather than from 1 to rather than from 1 to IIII. The results were . The results were

identical identical outer loop of loop 5 should be outer loop of loop 5 should be

parallelizableparallelizable..

Next the inner loop was changed to decrement Next the inner loop was changed to decrement

from from IJIJ to 2 rather than from 2 to to 2 rather than from 2 to IJIJ. . Differences Differences

appear in the last significant digitappear in the last significant digit (see next page). (see next page).

These differences are due to These differences are due to floating point floating point

rounding differencesrounding differences during four summations. during four summations.

Page 26: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2626

DMETRIC RoutineDMETRIC Routine - Not - Not

DependentDependent Minor results difference changing order of inner loop of loop 5Minor results difference changing order of inner loop of loop 5

Page 27: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2727

Parallel DMETRIC RoutineParallel DMETRIC Routine Since there is Since there is no data dependecy in the dosemetric no data dependecy in the dosemetric

calculations for each of the 59 isotopescalculations for each of the 59 isotopes, these computations , these computations

could be done in parallel.could be done in parallel.

Statements (using MPI's wall-time function: MPI_WTIME) were Statements (using MPI's wall-time function: MPI_WTIME) were

inserted to measure the amount of time spent in each inserted to measure the amount of time spent in each

subroutine.subroutine.

Approximately Approximately 17%17% of the processing of the processing timetime is spent in is spent in

subroutine subroutine DMETRICDMETRIC while about while about 82%82% of the processing time is of the processing time is

spent in subroutine spent in subroutine PRPGTPRPGT and less than and less than 1%1% of the processing of the processing

time is spent in the time is spent in the remainder remainder of the program.of the program.

Assuming infinite Assuming infinite parallelization of DMETRIC, the maximum parallelization of DMETRIC, the maximum

speedup obtained would be up to 17%.speedup obtained would be up to 17%.

Page 28: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2828

PRPGT RoutinePRPGT Routine PRPGT - propagate GCR's through the shielding and the target.PRPGT - propagate GCR's through the shielding and the target.

~ 82% of HZETRN processing is spent in PRPGT or routines it ~ 82% of HZETRN processing is spent in PRPGT or routines it

calls.calls.

At each propagation step from one depth to the next in the At each propagation step from one depth to the next in the

shield or target, the propagation for each of the 59 isotopes is shield or target, the propagation for each of the 59 isotopes is

performed in two stages:performed in two stages:

– The first stage computes the energy shift due to propagationThe first stage computes the energy shift due to propagation

– The second stage computes the attenuation and the The second stage computes the attenuation and the

secondary particle production due to collisionssecondary particle production due to collisionsTo test whether the propagation for each of the 59 ions could be done in parallel, the loop was broken up into four pieces (a J loop from 20 to 30, from 1 to 19, from 41 to 59, and from 31 to 40).If the loop can be performed in parallel, then the results from these four loops should be the same as the single loop from 1 to 59.

Page 29: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 2929

PRPGT Routine PRPGT Routine - Check - Check

DependencyDependency The following compares the results of breaking up main loop into four The following compares the results of breaking up main loop into four

loops (on the left) with the original results.loops (on the left) with the original results.

Significant different results demonstrate that the propagation can not be parallelized

for each of the 59 ions.

Page 30: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 3030

PRPGT Routine PRPGT Routine - Data - Data

DependentDependent Identical to original results reversing inner 1Identical to original results reversing inner 1stst and 2 and 2ndnd stage I loops stage I loops

possible to parallelize the 1possible to parallelize the 1stst or 2 or 2ndnd stages stages..

However, to test data dependence from the 1However, to test data dependence from the 1stst stage to the 2 stage to the 2ndnd

stage, the main J loop was divided into two loops (one for the 1stage, the main J loop was divided into two loops (one for the 1stst

stage and one for the 2stage and one for the 2ndnd stage) stage)

Results changed Results changed the 2the 2ndnd stage is dependent on the 1 stage is dependent on the 1stst stage stage

A barrier to prevent execution of the 2A barrier to prevent execution of the 2ndnd stage until the 1 stage until the 1stst stage stage

completescompletes

24% of the HZETRN processing is spent on the 124% of the HZETRN processing is spent on the 1stst stage while less stage while less

than 2% of the time is spent on the 2than 2% of the time is spent on the 2ndnd stage. Therefore, parallel stage. Therefore, parallel

processing of both stages does not appear worthwhileprocessing of both stages does not appear worthwhile..

Page 31: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 3131

Parallel PRPLI RoutineParallel PRPLI Routine PRPLI is called by PRPGT after the 1PRPLI is called by PRPGT after the 1stst and 2 and 2ndnd stage propagation stage propagation

has been completed for each of the 59 isotopes.has been completed for each of the 59 isotopes.

PRPLI performs the propagation of the six light ions (ions Z < 5).PRPLI performs the propagation of the six light ions (ions Z < 5).

~ ~ 53%53% of total HZETRN of total HZETRN time is spent on light ions propagation.time is spent on light ions propagation.

PRPLI propagates 45 x 6 fluence (# particles intersect a unit PRPLI propagates 45 x 6 fluence (# particles intersect a unit

area) matrix (45 energy points for each of the 6 light ions) area) matrix (45 energy points for each of the 6 light ions)

named PSI.named PSI.

Analysis of the has shown that there is Analysis of the has shown that there is no data dependency no data dependency

among the energy grid pointsamong the energy grid points..

It should, therefore, be It should, therefore, be possible to parallelize the PRPLI code possible to parallelize the PRPLI code

across the 45 energy grid pointsacross the 45 energy grid points..

Page 32: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 3232

General HZETRN General HZETRN

RecommendationsRecommendations

Arrays in Arrays in FortranFortran are stored in are stored in column-ordercolumn-order. .

more effecient to access in column order, rather more effecient to access in column order, rather

that row-orderthat row-order. .

HZETRN is using an old Fortran technique of

alternate entry points.

The use of alternate entry points is discouraged.

HZETRN uses COMMON blocks for global memory.

Fortran-90 MODULES should be used instead.

Page 33: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 3333

Conclusions & Future WorkConclusions & Future Work

HZETRN performance, written in Fortran 77 in the HZETRN performance, written in Fortran 77 in the

early 1990's, can be improved via simple code early 1990's, can be improved via simple code

optimizations and parallel processing using MPI optimizations and parallel processing using MPI

Maximum 50% speedup with current HZETRN Maximum 50% speedup with current HZETRN

expected expected

Additional performance improvements could be Additional performance improvements could be

obtained by implementing the 3obtained by implementing the 3rdrd Order Order

Lagrangian Interpolation routine (PHI), or the Lagrangian Interpolation routine (PHI), or the

natural log (LOG) and exponential (TEXP) functions natural log (LOG) and exponential (TEXP) functions

on a FPGA.on a FPGA.

Page 34: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 3434

ReferencesReferences J.W. Wilson, F.F. Badavi, F. A. Cucinotta, J.L. Shinn, G.D. Badhwar, R. Silberberg, C.H. Tsao, L.W. J.W. Wilson, F.F. Badavi, F. A. Cucinotta, J.L. Shinn, G.D. Badhwar, R. Silberberg, C.H. Tsao, L.W.

Townsend, R.K. Tripathi, Townsend, R.K. Tripathi, HZETRN: Description of a Free-Space Ion and Nucleon Transport Shielding HZETRN: Description of a Free-Space Ion and Nucleon Transport Shielding

Computer ProgramComputer Program, NASA Technical Paper 3495, May 1995., NASA Technical Paper 3495, May 1995.

J. W. Wilson, J.L. Shinn, R. C. Singleterry, H. Tai, S. A. Thibeault, L.C. Simmons, J. W. Wilson, J.L. Shinn, R. C. Singleterry, H. Tai, S. A. Thibeault, L.C. Simmons, Improved Spacecraft Improved Spacecraft

Materials for Radiation ShieldingMaterials for Radiation Shielding, NASA Langley Research Center. , NASA Langley Research Center.

spacesciene.spaceref.com/colloquia/mmsm/wilson_pos.pdfspacesciene.spaceref.com/colloquia/mmsm/wilson_pos.pdf

NASA Facts: Understanding Space RadiationNASA Facts: Understanding Space Radiation, FS-2002-10-080-JSC, October 2002., FS-2002-10-080-JSC, October 2002.

P. S. Pacheco, P. S. Pacheco, Parallel Programming with MPIParallel Programming with MPI, Morgan Kaufmann Publishers Inc.: San Francisso, , Morgan Kaufmann Publishers Inc.: San Francisso,

1997.1997.

S. J. Chapman,S. J. Chapman, Fortran 90/95 for Scientists and Engineers Fortran 90/95 for Scientists and Engineers, 2, 2ndnd edition. McGraw Hill: New York, 2004. edition. McGraw Hill: New York, 2004.

L. Shih, S. Larrondo, K. Katikaneni, A. Khan, T. Gilbert, S. Kodali, A. Kadari, L. Shih, S. Larrondo, K. Katikaneni, A. Khan, T. Gilbert, S. Kodali, A. Kadari, HIgh Performance Martian HIgh Performance Martian

Space Radiation MappingSpace Radiation Mapping, NASA/UHCL/UH_ISSO, pp. 121-122., NASA/UHCL/UH_ISSO, pp. 121-122.

L. Shih, L. Shih, Efficient Space Radiation Computation with Parallel FPGAEfficient Space Radiation Computation with Parallel FPGA, Y2006 – ISSO Annual Report, pp. , Y2006 – ISSO Annual Report, pp.

56-61.56-61. Gilbert, T. and L. Shih. "High-Performance Martian Space Radiation Mapping," IEEE/ACM/UHCL Gilbert, T. and L. Shih. "High-Performance Martian Space Radiation Mapping," IEEE/ACM/UHCL

Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.

Kadari, A.. S. Kodali, T. Gilbert, and L. Shih. "Space Radiation Analysis with FPGA," IEEE/ACM/UHCL Kadari, A.. S. Kodali, T. Gilbert, and L. Shih. "Space Radiation Analysis with FPGA," IEEE/ACM/UHCL Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.

F. A. Cucinotta, "Space Radiation Biology," NASA-M. D. Anderson Cancer Center Mini-Retreat, Jan. 25, F. A. Cucinotta, "Space Radiation Biology," NASA-M. D. Anderson Cancer Center Mini-Retreat, Jan. 25, 2002 <2002 <http://advtech.jsc.nasa.gov/presentation_portal.shtmhttp://advtech.jsc.nasa.gov/presentation_portal.shtm>.>.

Space Radiation Health Project, May 3, 2005, NASA-JSC, March 7, 2005 <Space Radiation Health Project, May 3, 2005, NASA-JSC, March 7, 2005 <http://srhp.jsc.nasa.gov/http://srhp.jsc.nasa.gov/> >

Page 35: TG08shih@UHCL.edu1 Toward Parallel Space Radiation Analysis Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan

TG08TG08 [email protected]@UHCL.edu 3535

AcknowledgementsAcknowledgements NASA LaRC -NASA LaRC - Robert C. Singleterry JrRobert C. Singleterry Jr, PhD, PhD NASA JSC/CARR PVA&M -NASA JSC/CARR PVA&M - Premkumar B. SagantiPremkumar B. Saganti, PhD, PhD TeraGrid, TACC TeraGrid, TACC TLC2 -TLC2 - Mark HuangMark Huang & Erik & Erik EngquistEngquist Texas Space Grant Consortium ISSOTexas Space Grant Consortium ISSO