integrating trilinos solvers to seam code dagoberto a.r. justo – unm tim warburton – unm bill...

Post on 28-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Integrating Trilinos Integrating Trilinos Solvers to SEAM codeSolvers to SEAM code

Dagoberto A.R. Justo – UNMDagoberto A.R. Justo – UNM

Tim Warburton – UNMTim Warburton – UNM

Bill Spotz – SandiaBill Spotz – Sandia

SEAM SEAM (NCAR(NCAR))SpectralSpectral

ElementElement

AtmosphericAtmospheric

MethodMethod

AztecOOAztecOO EpetraEpetra NoxNox IfpackIfpack PETScPETSc KomplexKomplex

Trilinos Trilinos (Sandia (Sandia Lab)Lab)

AztecOOAztecOO

SolversSolvers– CG, CGS, BICGStab, GMRES, TfqmrCG, CGS, BICGStab, GMRES, Tfqmr

PreconditionersPreconditioners– Diagonal Jacobi, Least Square, Neumann, Diagonal Jacobi, Least Square, Neumann,

Domain Decomposition, Symmetric Gauss-Domain Decomposition, Symmetric Gauss-Seidel Seidel

Matrix Free implementationMatrix Free implementation C++ (Fortran interface)C++ (Fortran interface) MPIMPI

ImplementationImplementation

SEAM CODE

.

.

. Pcg_solver

.

.

(F90)

Pcg_solver

.

.

Aztec_solvers( )

.

(F90)

Sub Aztec_solvers

.

AZ_Iterate( )

(C)

Matrix_vector_C

(C)

Matrix_vector

.

(F90)

Prec_Jacobi

.

(F90)

Prec_Jacobi_C

(C)

A

Z

T

E

C

Machines usedMachines used

Pentium III Notebook (serial)Pentium III Notebook (serial)– Linux, LAM-MPI, Intel CompilersLinux, LAM-MPI, Intel Compilers

Los Lobos at HPC@UNMLos Lobos at HPC@UNM– Linux ClusterLinux Cluster– 256 nodes256 nodes– IBM Pentium III 750 MHz, 256 KB L2 Cache, IBM Pentium III 750 MHz, 256 KB L2 Cache,

1 Gb RAM1 Gb RAM– Portland Group compilerPortland Group compiler– MPICH for Myrinet interconnectionsMPICH for Myrinet interconnections

Graphical Results from Graphical Results from SEAMSEAM

Energy

Mass

MemoryMemory(in Mbytes per processor)(in Mbytes per processor)

0

5

10

15

20

25

30

p=2 p=4 p=8 p=16

SEAM 6x6x6

SEAM+Aztec6x6x6SEAM12x12x6SEAM+Aztec12x12x6

Speed UpSpeed Up

From 1 to 160 processors.From 1 to 160 processors. Time of SimulationTime of Simulation

144 time iterations144 time iterations

x 300 s = 12 h simulationx 300 s = 12 h simulation Verify results using mass, energy,Verify results using mass, energy,

……– (Different result for 1 proc)(Different result for 1 proc)

Speed Up – SEAMSpeed Up – SEAMselecting # of elements ne=24x24x6selecting # of elements ne=24x24x6

Speed Up – SEAMSpeed Up – SEAMselecting order np=6selecting order np=6

Speed Up – Speed Up – SEAM+AztecSEAM+Aztecbest: cgs solverbest: cgs solver

Speed Up – Speed Up – SEAM+AztecSEAM+Aztecbest: cgs solver + Least Square best: cgs solver + Least Square preconditionerpreconditioner

Speed Up – Speed Up – SEAM+AztecSEAM+Aztecincreasing np -> increases speedupincreasing np -> increases speedup

Upshot – SEAMUpshot – SEAM(One CG iteration)(One CG iteration)

Upshot – SEAMUpshot – SEAM(matrix times vector communication)(matrix times vector communication)

Upshot – SEAM+AztecUpshot – SEAM+Aztec(One CG iteration)(One CG iteration)

Upshot – SEAM+AztecUpshot – SEAM+Aztec(Matrix times vector (Matrix times vector communication)communication)

Upshot – SEAM+AztecUpshot – SEAM+Aztec(Vector Reduction)(Vector Reduction)

Time (24x24x6 elements, 2 proc.)Time (24x24x6 elements, 2 proc.)

SolverSolver Iter.Iter. Time Time (loop) (loop)

Time/iterTime/iter

SEAM p=6SEAM p=6 33.0 it33.0 it 7.48 s7.48 s 0.22 s/it0.22 s/it

SEAM p=12SEAM p=12 56.9 it56.9 it 81.2 s81.2 s 1.42 s/it1.42 s/it

Cg p=6Cg p=6 87.1 it87.1 it 28.2 s28.2 s 0.32 s/it0.32 s/it

Cgs p=6Cgs p=6 74.1 it74.1 it 28.6 s28.6 s 0.38 s/it0.38 s/it

Tfqmr p=6Tfqmr p=6 75.2 it75.2 it 31.1 s31.1 s 0.41 s/it0.41 s/it

Bicg p=6Bicg p=6 94.1 it94.1 it 29.4 s29.4 s 0.31 s/it0.31 s/it

Cgs ls p=6Cgs ls p=6 35.1 it35.1 it 42.0 s42.0 s 1.19 s/it1.19 s/it

CG Jacobi CG Jacobi p=6p=6

45.8 it45.8 it 17.2 s17.2 s 0.37 s/it0.37 s/it

Cgs Cgs Jacobip=6Jacobip=6

31.7 it31.7 it 15.3 s15.3 s 0.48 s/it0.48 s/it

Cgs p=12Cgs p=12 60.4 it60.4 it 274. S274. S 4.53 s/it4.53 s/it

Conclusions &Conclusions &Suggested Future Suggested Future EffortsEfforts SEAM+Aztec works!SEAM+Aztec works! SEAM+Aztec is 2x slowerSEAM+Aztec is 2x slower

difference in CG algorithmsdifference in CG algorithms

SEAM+Aztec time-iteration is 50% SEAM+Aztec time-iteration is 50% slowerslower

0.1% of time lost in calls, preparation 0.1% of time lost in calls, preparation for Aztec.for Aztec.

More time More time better tune-up. better tune-up. Domain decomposition Domain decomposition

PreconditionersPreconditioners

SEAM + Aztec works!SEAM + Aztec works! More time More time better tune-up. better tune-up.

Conclusions &Conclusions &Suggested Future Suggested Future EffortsEfforts

top related