on the performance of pc clusters in solving partial differential equations xing cai smund degrd...

17
On the Performance of PC On the Performance of PC Clusters in Solving Clusters in Solving Partial Differential Partial Differential Equations Equations Xing Cai Xing Cai Åsmund Ødegård Åsmund Ødegård Department of Informatics Department of Informatics University of Oslo University of Oslo Norway Norway

Upload: herbert-hines

Post on 18-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

A generic finite element PDE solver Time stepping t 0, t 1, t 2 …Time stepping t 0, t 1, t 2 … Spatial discretization on computational gridSpatial discretization on computational grid Solution of nonlinear problemsSolution of nonlinear problems Solution of linearized problemsSolution of linearized problems Iterative solution of Ax=bIterative solution of Ax=b

TRANSCRIPT

Page 1: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

On the Performance of PC On the Performance of PC Clusters in Solving Partial Clusters in Solving Partial

Differential EquationsDifferential EquationsXing CaiXing Cai

Åsmund ØdegårdÅsmund ØdegårdDepartment of InformaticsDepartment of Informatics

University of OsloUniversity of OsloNorwayNorway

Page 2: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

Outline of the talkOutline of the talk

• IntroductionIntroduction• Beowulf clusters – cost effective Beowulf clusters – cost effective

approach to solving approach to solving PDEsPDEs• Performance analysis of a Linux Performance analysis of a Linux

clustercluster• Numerical experiments & Numerical experiments &

measurementsmeasurements

Page 3: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

A generic finite element PDE A generic finite element PDE solversolver

• Time stepping Time stepping tt00, , tt11, , tt22……• Spatial discretization on Spatial discretization on

computational gridcomputational grid• Solution of nonlinear problemsSolution of nonlinear problems• Solution of linearized problemsSolution of linearized problems• Iterative solution of Iterative solution of Ax=bAx=b

Page 4: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

An observationAn observation• The computation-intensive part is the The computation-intensive part is the

iterative solution ofiterative solution of Ax=bAx=b • A parallel finite element PDE solver needs to A parallel finite element PDE solver needs to

run the linear algebra run the linear algebra kernelkernels in parallels in parallel– vector additionvector addition– inner-product of two vectorsinner-product of two vectors– matrix-vector productmatrix-vector product

• Two types of inter-processor communicationTwo types of inter-processor communication• Ratio computation/communication is highRatio computation/communication is high• Relatively tolerant of slow communicationRelatively tolerant of slow communication

Page 5: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

A natural parallelization of PDE A natural parallelization of PDE solverssolvers

• The global solution domain is The global solution domain is partitioned into many smaller sub-partitioned into many smaller sub-domainsdomains

• One sub-domain works as a ”unit”, with One sub-domain works as a ”unit”, with its sub-matrices and sub-vectorsits sub-matrices and sub-vectors

• No need to create global matrices and No need to create global matrices and vectors physicallyvectors physically

• The global linear algebra operations can The global linear algebra operations can be realized by be realized by local operations + inter-local operations + inter-processor communicationprocessor communication

Page 6: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

Linear-algebra level parallelizationLinear-algebra level parallelization

• A SPMD modelA SPMD model• Reuse of existing code for local linear Reuse of existing code for local linear

algebra operationsalgebra operations• Need new code for the parallelization Need new code for the parallelization

specific tasksspecific tasks– grid partition (non-overlapping, grid partition (non-overlapping,

overlapping)overlapping)– inter-processor communication routinesinter-processor communication routines

Page 7: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

Object orientationObject orientation• An add-on ”toolbox” containing all the An add-on ”toolbox” containing all the

parallelization specific codesparallelization specific codes• The ”toolbox” has many high-level The ”toolbox” has many high-level

routines, hides the low-level MPI detailsroutines, hides the low-level MPI details• The existing sequential libraries are The existing sequential libraries are

slightly modified to include a ”dummy” slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-interface, thus incorporating ”fake” inter-processor communicationsprocessor communications

• A seamless coupling between the huge A seamless coupling between the huge sequential libraries and the add-on toolboxsequential libraries and the add-on toolbox

Page 8: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

DiffpackDiffpack

• O-O software environment for O-O software environment for scientific computation scientific computation (C++)(C++)

• Rich collection of PDE solution Rich collection of PDE solution components - components - portable, flexible, extensibleportable, flexible, extensible

• http://www.nobjects.comhttp://www.nobjects.com• H.P.Langtangen, H.P.Langtangen, Computational Computational

Partial Differential EquationsPartial Differential Equations, , Springer 1999Springer 1999

Page 9: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

Straightforward parallelizationStraightforward parallelization

• Develop a sequential simulator, without Develop a sequential simulator, without paying attention to parallelismpaying attention to parallelism

• Follow the Diffpack coding standardsFollow the Diffpack coding standards• Use theUse the add-on add-on toolboxtoolbox for parallel for parallel

computingcomputing• Add a few new statements for Add a few new statements for

transformation to a parallel simulatortransformation to a parallel simulator

Page 10: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

A Linux clusterA Linux cluster

• 48 Pentium-III 500MHz procs (24 48 Pentium-III 500MHz procs (24 nodes)nodes)

• 512 MB memory per node512 MB memory per node• One 3com905B network card per nodeOne 3com905B network card per node• Fast ethernet 100 Mbit/sFast ethernet 100 Mbit/s• 26-port Cisco Catalyst 2926 switch26-port Cisco Catalyst 2926 switch• Price: around $60,000Price: around $60,000

Page 11: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

Parallel simulation of 3D acoustic Parallel simulation of 3D acoustic fieldfield

condition initialcondition initial

),(

),(

cT

T

xx

Rtx

Rtx

3D nonlinear model 3D nonlinear model

Page 12: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

3D nonlinear acoustic field 3D nonlinear acoustic field simulationsimulation

CPUsCPUs Origin 2000Origin 2000 Linux ClusterLinux ClusterCPU-time

Speedup

CPU-time

Speedup

2 8670.8 N/A 6681.5 N/A4 4726.5 3.75 3545.9 3.778 2404.2 7.21 1881.1 7.10

16 1325.6 13.0 953.89 14.024 1043.7 16.6 681.77 19.632 725.23 23.9 563.54 23.748 557.61 31.1 673.77 19.8

Comparison between Origin 2000 and Linux clusterComparison between Origin 2000 and Linux cluster1,030,301 grid points1,030,301 grid points

Page 13: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

Impressible Navier-StokesImpressible Navier-Stokes

• Numerical strategy: operator splittingNumerical strategy: operator splitting• Calculation of an intermediate velocity Calculation of an intermediate velocity

in a predictor-corrector wayin a predictor-corrector way• Solution of a Poisson equationSolution of a Poisson equation• Correction of the intermediate velocityCorrection of the intermediate velocity

02

2

v

bvpvvtv

Page 14: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

Impressible Navier-StokesImpressible Navier-StokesExplicit schemes for predicting and correcting the velocityExplicit schemes for predicting and correcting the velocity

Implicit solution of the pressure by CGImplicit solution of the pressure by CG

PP CPU-CPU-timetime

SpeeduSpeedupp

EfficiencEfficiencyy

1 665.45 N/A N/A2 329.57 2.02 1.014 166.55 4.00 1.008 89.98 7.40 0.9216 48.96 13.59 0.8524 34.85 19.09 0.8048 34.22 19.45 0.41

Page 15: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

3D nonlinear water waves3D nonlinear water waves• Fully nonlinear 3D water waves• Primary unknowns:

wallssolidon 0

surfaceon water 02/)(

surfaceon water 0olumein water v 0

222

2

n

gzyxt

zyyxxt

,

Page 16: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

3D nonlinear water waves3D nonlinear water waves• Global 3D grid: 49x49x41Global 3D grid: 49x49x41• Global solver: CG + overlapping Schwarz prec.Global solver: CG + overlapping Schwarz prec.• Multigrid V-cycle as subdomain solverMultigrid V-cycle as subdomain solver• CPU measurement of a total of 32 time stepsCPU measurement of a total of 32 time steps• Parallel simulation on the Linux clusterParallel simulation on the Linux cluster

P Total CPU # iters CPU/iter Speedup Efficiency 1 561.56 6.16 2.849 N/A N/A 4 386.83 15.69 0.770 3.70 0.92 8 272.22 21.31 0.399 7.14 0.89

16 150.01 22.59 0.208 13.73 0.86 24 124.52 26.50 0.147 19.40 0.81 48 124.84 30.13 0.129 22.00 0.46

Page 17: On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai smund degrd Department of Informatics University of Oslo Norway

SummarySummary

• OOP+MPI give portable parallel softwareOOP+MPI give portable parallel software• Beowulf clusters suit well for solving PDEsBeowulf clusters suit well for solving PDEs• Applicable to a wide range of PDEsApplicable to a wide range of PDEs• Performance: satisfactory speed-upPerformance: satisfactory speed-up• Issues need to be considered for further Issues need to be considered for further

improvementimprovement