jacobi solver status

18
Jacobi solver status Lucian Anton, Saif Mulla, Stef Salvini CCP_ASEARCH meeting October 8, 2013 Daresbury 1

Upload: gizi

Post on 24-Feb-2016

101 views

Category:

Documents


0 download

DESCRIPTION

Jacobi solver status. Lucian Anton, Saif Mulla , Stef Salvini CCP_ASEARCH meeting October 8, 2013 Daresbury. Outline. Code structure Front end Numerical kernels Data collection Performance data Intel SB Xeon Phi BlueGeneQ GPU. Code structure. Read input from command line - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Jacobi solver status

Jacobi solver status

Lucian Anton, Saif Mulla, Stef Salvini

CCP_ASEARCH meetingOctober 8, 2013

Daresbury

1

Page 2: Jacobi solver status

Outline• Code structure

– Front end– Numerical kernels– Data collection

• Performance data– Intel SB– Xeon Phi– BlueGeneQ– GPU

8/10/13 Jacobi test program 2

Page 3: Jacobi solver status

Code structure

8/10/13 Jacobi test program 3

• Read input from command line– Grid sizes, length of iteration block, # of iteration

blocks ,..– Algorithm to use– Output format (header, test iterations, …)

• Initialize grid with an eigenvalue of Jacobi smoother• Run several iteration blocks• Collect min, max, average times.

Page 4: Jacobi solver status

Build model

8/10/13 Jacobi test program 4

• Uses a generic Makefile + plaform/*.inc files• F90 := source /opt/intel/composerxe/bin/compilervars.sh intel64 && \• source /opt/intel/impi/4.1.0/intel64/bin/mpivars.sh && mpiifort

• CC := source /opt/intel/composerxe/bin/compilervars.sh intel64 && \• source /opt/intel/impi/4.1.0/intel64/bin/mpivars.sh && icc

• LANG = C

• ifdef USE_MIC• FMIC = -mmic• endif

• ifdef USE_MPI• FMPI=-DUSE_MPI• endif

• ifdef USE_DOUBLE_PRECISION• DOUBLE=-DUSE_DOUBLE_PRECISION• endif

• ifdef USE_VEC1D• VEC1D = -DUSE_VEC1D• endif

• #FC = module add intel/comp intel/mpi && mpiifort

Page 5: Jacobi solver status

Command line parameters

8/10/13 Jacobi test program 5

• arcmport01:~/Projects/HOMB>./homb_c_gcc_debug_gpu.exe -help• Usage: [-ng <grid-size-x> <grid-size-y> <grid-size-z> ] [ -nb <block-size-x> <block-

size-y> <block-size-z>] [-np <num-proc-x> <num-proc-y> <num-proc-z>] [-niter <num-iterations>] [-biter <iterations-block-size>] [-malign <memory-alignment> ] [-v] [-t] [-pc] [-model <model_name> [num-waves] [threads-per-column]] [-nh] [-help]

• arcmport01:~/Projects/HOMB>./homb_c_gcc_debug_gpu.exe -model help• possible values for model parameter:• baseline• baseline-opt• blocked• wave num-waves threads-per-column• basegpu• optgpu

• Note for wave model: if threads-per-column == 0 diagonal wave kernel is used.

Page 6: Jacobi solver status

README file

8/10/13 Jacobi test program 6

Full explanation on command line options are provided in README

• The following flags can be used to set the grid sized and other run parameters:

• -ng <nx> <ny> <nz> set the global gris sizes

• -nb <bx> <by> <bz> set the computational block size, relevant only for blocked model.

• Notes: 1) no sanity checks tests are done, you are on your own.

• 2) for blocked model the OpeNMP parallelism is done over

• computational blocks. One must ensure that there

• enough work for all threads by setting suitable

• block sizes.

Page 7: Jacobi solver status

Correctness check

8/10/13 Jacobi test program 7

• -t flag checks if norm ratio are close to Jacobi smoother eigenvalue

arcmport01:~/Projects/HOMB>./homb_c_gcc_debug_gpu.exe -t -niter 7Correctness checkiteration, norm ratio, deviation from eigenvalue 0 6.36918e+01 6.26966e+01 1 9.95185e-01 2.55054e-08 2 9.95185e-01 1.50473e-08 3 9.95185e-01 2.57243e-08 4 9.95185e-01 3.27436e-08 5 9.95185e-01 1.96427e-08 6 9.95185e-01 3.17978e-08# Last norm 6.187368259733268e+01#==========================================================================================================## NThs Nx Ny Nz NITER minTime meanTime

maxTime #==========================================================================================================#

8 33 33 33 1 1.299e-04 1.487e-04 1.690e-04

Page 8: Jacobi solver status

Algorithms

8/10/13 Jacobi test program 8

• Basic 3 loops iteration over the grid– OpenMP parallelism applied to external loop– If condition from inner loop eliminated

• Blocked iterations• Wave iterations

Page 9: Jacobi solver status

Algorithms: wave details

8/10/13 Jacobi test program 9

Z

Y

NewOld Old New

Page 10: Jacobi solver status

Algorithms: helping vectorisation

8/10/13 Jacobi test program 10

The inner loop can be replace with an easier to vectorize function:// 1D loop that helps the compiler to vectorize

static void vec_oneD_loop(const int n, const Real uNorth[], const Real uSouth[], const Real uWest[], const Real uEast[], const Real uBottom[], const Real uTop[], Real w[] ){ int i;

#ifdef __INTEL_COMPILER#pragma ivdep#endif#ifdef __IBMC__#pragma ibm independent_loop#endif for (i=0; i < n; ++i) w[i] = sixth * (uNorth[i] + uSouth[i] + uWest[i] + uEast[i] + uBottom[i] + uTop[i]);}

Page 11: Jacobi solver status

Algorithms: CUDA

8/10/13 Jacobi test program 11

• Base laplace3D (from Mike’s lecture notes)• Shared memory in XY plane• … more to come

Page 12: Jacobi solver status

Data collection

8/10/13 Jacobi test program 12

With such a large parameter space we have a big-ish data problem.Bash script + gnuplot

index=0for exe in $exe_listdo for model in $model_list do for nth in $threads_list do export OMP_NUM_THREADS=$nth for ((linsize=10; linsize <= max_linsize; linsize += step)) do biter=$(((10*max_linsize)/linsize)) niter=5 if [ "$model" = wave ] then nwave="$biter $((nth<biter?nth:biter))" echo "model $model $nwave" else nwave="" fi

if [ "$blk_x" -eq 0 ] ; then blk_xt=$linsize ; else blk_xt=$blk_x ; fi if [ "$blk_y" -eq 0 ] ; then blk_yt=$linsize ; else blk_yt=$blk_y ; fi if [ "$blk_z" -eq 0 ] ; then blk_zt=$linsize ; else blk_zt=$blk_z ; fi

echo "./"$exe" -ng $linsize $linsize $linsize -nb $blk_xt $blk_yt $blk_zt -model $model $nwave

Page 13: Jacobi solver status

SandyBrige baseline

8/10/13 Jacobi test program 13

Page 14: Jacobi solver status

SB: blocked and wave

8/10/13 Jacobi test program 14

Page 15: Jacobi solver status

BGQ

8/10/13 Jacobi test program 15

Page 16: Jacobi solver status

Xeon Phi vs SandyBridge

8/10/13 Jacobi test program 16

Page 17: Jacobi solver status

Fermi data

8/10/13 Jacobi test program 17

Page 18: Jacobi solver status

Conclusions & To do

8/10/13 Jacobi test program 18

• We have an integrate set of Jacobi smoother algorithms– OpenMP, CUDA, MPI(almost)– Flexible build system– Run parameters can be selected from command line and

preprocessor flags– Correctness check – Scripted data collection– README file

• Tested on several system (Idataplex, BGQ, Emerald,…, MacOs laptop)

• GPU needs further improvements• ….