performance analysis of ga applications19th june 2001 computational science and engineering...

31
Performance analysis of GA applications 19th June 2001 onal Science and Engineering Department Daresbury La Performance analysis of GA- Performance analysis of GA- based applications using the based applications using the Vampir tool Vampir tool NWChem and GAMESS-UK on High-end NWChem and GAMESS-UK on High-end and Commodity class machines. and Commodity class machines. H.J.J. van Dam H.J.J. van Dam , Martyn Guest and Paul , Martyn Guest and Paul Sherwood, Sherwood, Quantum Chemistry Group, CLRC Daresbury Laboratory Quantum Chemistry Group, CLRC Daresbury Laboratory http://www.cse.clrc.ac.uk http://www.cse.clrc.ac.uk Miles Deegan Miles Deegan Compaq (Galway) Compaq (Galway)

Upload: roland-ball

Post on 01-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Performance analysis of GA-based Performance analysis of GA-based applications using the Vampir tool applications using the Vampir tool

NWChem and GAMESS-UK on High-end and NWChem and GAMESS-UK on High-end and Commodity class machines.Commodity class machines.

H.J.J. van DamH.J.J. van Dam, Martyn Guest and Paul Sherwood, , Martyn Guest and Paul Sherwood, Quantum Chemistry Group, CLRC Daresbury LaboratoryQuantum Chemistry Group, CLRC Daresbury Laboratory

http://www.cse.clrc.ac.ukhttp://www.cse.clrc.ac.uk

Miles DeeganMiles DeeganCompaq (Galway)Compaq (Galway)

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

OutlineOutline Background : PNNL, Daresbury and PALLASBackground : PNNL, Daresbury and PALLAS Tool for Performance Analysis - VAMPIR & VAMPIR TraceTool for Performance Analysis - VAMPIR & VAMPIR Trace

VAMPIR - analysis of trace filesVAMPIR - analysis of trace files VAMPIR Trace VAMPIR Trace

Trace Library for MPI applicationsTrace Library for MPI applications Extensions to handle GA applicationsExtensions to handle GA applications

Case StudiesCase Studies DFT Calculations on Zeolite Fragments (347 - 1687 GTOs) with DFT Calculations on Zeolite Fragments (347 - 1687 GTOs) with

Coulomb FittingCoulomb Fitting High-end Systems - Cray T3E/1200E, Compaq AlphaServer SC High-end Systems - Cray T3E/1200E, Compaq AlphaServer SC

(667 & 833 MHz), SGI Origin 3000/R12k-400 and IBM SP/WH2-375 (667 & 833 MHz), SGI Origin 3000/R12k-400 and IBM SP/WH2-375 Commodity Clusters (IA32 and Alpha Linux)Commodity Clusters (IA32 and Alpha Linux)

NWChem and GAMESS-UKNWChem and GAMESS-UK Distributed data (NWchem) and Replicated Data (GAMESS-UK)Distributed data (NWchem) and Replicated Data (GAMESS-UK) Analysis of GAs and PeIGsAnalysis of GAs and PeIGs

SummarySummary

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

PNNL - Daresbury - Pallas Collaborations PNNL - Daresbury - Pallas Collaborations

PNNL - Daresbury CollaborationPNNL - Daresbury Collaboration

Long term interaction between chemistry activitiesLong term interaction between chemistry activities Proposed developments around DFT derivative codesProposed developments around DFT derivative codes

UK Chemistry Collaboration Forum (CCP1)UK Chemistry Collaboration Forum (CCP1) DFT Flagship project and subsequent DL extensionsDFT Flagship project and subsequent DL extensions DFT Functional Repository (http://www.dl.ac.uk/DFTlib) DFT Functional Repository (http://www.dl.ac.uk/DFTlib)

Daresbury - Pallas CollaborationDaresbury - Pallas Collaboration

Demonstrate that Demonstrate that clusters of IA32 and Alphaclusters of IA32 and Alpha processors are processors are competitive with HPC servers (with competitive with HPC servers (with low to mediumlow to medium processor processor numbers) for a numbers) for a wide range of applicationswide range of applications

Evaluate the suitability of clusters for Evaluate the suitability of clusters for high-end computinghigh-end computing Analyse Analyse kernels and full applicationskernels and full applications (May 2000 - Sep.2001) (May 2000 - Sep.2001)

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Vampir 2.5

VVisualization and AAnalysis ofMPIMPI Prrograms

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Vampir FeaturesVampir Features

Offline trace analysis for MPI (and others ...)Offline trace analysis for MPI (and others ...) Traces generated by Traces generated by Vampirtrace Vampirtrace tool (`ld ... -lVT -lpmpi -lmpi`)tool (`ld ... -lVT -lpmpi -lmpi`) Convenient user–interfaceConvenient user–interface Scalability Scalability in time and processor–spacein time and processor–space Excellent Excellent zoomingzooming and and filteringfiltering High–performance graphicsHigh–performance graphics Display and analysis of Display and analysis of MPIMPI andand application application events:events:

execution of execution of MPIMPI routines routines point–to–point and collective communicationpoint–to–point and collective communication MPI–2 I/O operationsMPI–2 I/O operations execution of application subroutines (optional)execution of application subroutines (optional)

““Easy” customizationEasy” customization

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Vampir DisplaysVampir Displays

Global displaysGlobal displays show all selected processes show all selected processes Summary Chart:Summary Chart: aggregated profiling information aggregated profiling information Activity Chart:Activity Chart: presents per–process profiling information presents per–process profiling information Timeline:Timeline: detaileddetailed application execution over time axisapplication execution over time axis Communication statistics:Communication statistics: message statistics for each process pair message statistics for each process pair Global Comm. Statistics:Global Comm. Statistics: collective operations statistics collective operations statistics I/O Statistics:I/O Statistics: MPI I/O operation statisticsMPI I/O operation statistics Calling Tree:Calling Tree: draws global or local dynamic calling trees draws global or local dynamic calling trees

Process displaysProcess displays show a single process per window show a single process per window Activity ChartActivity Chart TimelineTimeline Calling TreeCalling Tree

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Timeline Display (Message Info)Timeline Display (Message Info)

Source–code references are displayed if recorded by VampirtraceSource–code references are displayed if recorded by Vampirtrace

Click on message line

See message details

Messagesend op Message receive op

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Vampirtrace

Tracing ofMPI andApplicationEvents

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

VampirtraceVampirtrace

Current version: Vampirtrace 2.0Current version: Vampirtrace 2.0 Significant new features:Significant new features:

records collective records collective communicationcommunication

enhanced filter functionsenhanced filter functions extended APIextended API records source–code information records source–code information

(selected platforms)(selected platforms) support for shmem (Cray T3E)support for shmem (Cray T3E) records MPI–2 I/O operationsrecords MPI–2 I/O operations

Available for all major MPI Available for all major MPI platformsplatforms

Library that records all Library that records all MPI MPI calls, calls, point to point communication, and point to point communication, and collective operations.collective operations.

Runtime filters available to limit Runtime filters available to limit tracefile size.tracefile size.

Provides an API for user Provides an API for user instrumentation.instrumentation.

Requires MPIRequires MPI to gather to gather performance data.performance data.

Uses the profiling interface of MPI Uses the profiling interface of MPI and is therefore independent of the and is therefore independent of the specifics of a given MPI specifics of a given MPI implementation.implementation.

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Vampirtrace APIVampirtrace API

Switching tracing on/offSwitching tracing on/off SUBROUTINE SUBROUTINE VTTRACEOFFVTTRACEOFF( )( ) SUBROUTINE SUBROUTINE VTTRACEONVTTRACEON( )( )

Specifying user-defined statesSpecifying user-defined states SUBROUTINE SUBROUTINE

VTSYMDEFVTSYMDEF(ICODE, STATE, (ICODE, STATE, ACTIVITY, IERR)ACTIVITY, IERR)

Entering/leaving user-defined Entering/leaving user-defined statesstates SUBROUTINE SUBROUTINE VTBEGINVTBEGIN(ICODE, (ICODE,

IERR)IERR) SUBROUTINE SUBROUTINE VTENDVTEND(ICODE, (ICODE,

IERR)IERR)

Logging message send/receive Logging message send/receive events (undocumented)events (undocumented) SUBROUTINE SUBROUTINE VTLOGVTLOGSENDMSGSENDMSG( (

IME, ITO, ICNT, ITAG, ICOMMID, IME, ITO, ICNT, ITAG, ICOMMID, IERR)IERR)

SUBROUTINE SUBROUTINE VTLOGRECVMSGVTLOGRECVMSG( ( IME, IFRM, ICNT, ITAG, IME, IFRM, ICNT, ITAG, ICOMMID, IERR)ICOMMID, IERR)

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Single, shared data structure

Physically distributed data • Shared-memory-like model– Fast local access– NUMA aware and easy to use– MIMD and data-parallel modes– Inter-operates with MPI, …

• BLAS and linear algebra interface• Ported to major parallel machines

– IBM, Cray, SGI, clusters,...• Originated in an HPCC project• Used by 5 major chemistry codes,

financial futures forecasting,astrophysics, computer graphics

Global ArraysGlobal Arrays

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Instrumenting single-sided memory Instrumenting single-sided memory accessaccess

Approach 1:Approach 1: Instrument the puts, gets and data Instrument the puts, gets and data serverserver Advantage:Advantage: robust and accurate robust and accurate Disadvantage:Disadvantage: one does not always have access one does not always have access

to the source of the data serverto the source of the data server

Approach 2:Approach 2: Instrument the puts and gets only, Instrument the puts and gets only, cheating on the source and destination of the cheating on the source and destination of the messagesmessages Advantage:Advantage: no instrumentation of the data server no instrumentation of the data server

requiredrequired Disadvantage:Disadvantage: timings of the messages are timings of the messages are

inaccurate in case of non-blocking operationsinaccurate in case of non-blocking operations

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

RuntimRuntime tracing optionse tracing options

The tracing of activities can The tracing of activities can be modified at runtime be modified at runtime through a configuration file.through a configuration file.

Tracing of messages can Tracing of messages can not be changed.not be changed.

VTTRACEON and VTTRACEON and VTTRACEOFF should be VTTRACEOFF should be used sparingly.used sparingly.

Logfile-name /home/user/prog.bpvLogfile-name /home/user/prog.bpv

Symbol nnodes offSymbol nnodes off

Symbol nodeid offSymbol nodeid off

Symbol GA_Nnodes offSymbol GA_Nnodes off

Symbol GA_Nodeid offSymbol GA_Nodeid off

Practical issues

• The vampirtrace library and evaluation licenses can be The vampirtrace library and evaluation licenses can be downloaded from downloaded from http://www.pallas.com/http://www.pallas.com/• Evaluation licenses are limited to 32 processorsEvaluation licenses are limited to 32 processors• CPU cycle providers are not too keen to provide vampirtrace?CPU cycle providers are not too keen to provide vampirtrace?

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Case Studies - Zeolite FragmentsCase Studies - Zeolite Fragments

SiSi88OO77HH1818 347/832347/832

SiSi88OO2525HH1818 617/1444617/1444

SiSi2626OO3737HH3636 1199/28181199/2818

SiSi2828OO6767HH3030 1687/39281687/3928

• DFT Calculations with DFT Calculations with Coulomb FittingCoulomb Fitting

Basis (Godbout et al.)Basis (Godbout et al.) DZVP - O, SiDZVP - O, Si

DZVP2 - HDZVP2 - HFitting Basis:Fitting Basis:

DGAUSS-A1 - O, SiDGAUSS-A1 - O, SiDGAUSS-A2 - HDGAUSS-A2 - H

• NWChem & GAMESS-UKNWChem & GAMESS-UK

Both codes use auxiliary fitting Both codes use auxiliary fitting basis for coulomb energy, with basis for coulomb energy, with 3 centre 2 electron integrals 3 centre 2 electron integrals held in coreheld in core..

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

High-End and Commodity SystemsHigh-End and Commodity Systems Cray T3E/1200ECray T3E/1200E

816 processor system at Manchester (CSAR service)816 processor system at Manchester (CSAR service) 600 Mz EV56 Alpha processor with 256 MB memory600 Mz EV56 Alpha processor with 256 MB memory

IBM SP (32 CPU system at DL)IBM SP (32 CPU system at DL) 4-way Winterhawk2 SMP “thin nodes” with 2 GB memory4-way Winterhawk2 SMP “thin nodes” with 2 GB memory 375 MHz Power3-II processors with 8 MB L2 cache375 MHz Power3-II processors with 8 MB L2 cache

Compaq AlphaServer SC - 667 (APAC) and 833 MHz CPUsCompaq AlphaServer SC - 667 (APAC) and 833 MHz CPUs 4-way ES40/667 and /833 SMP nodes with 2 GB memory4-way ES40/667 and /833 SMP nodes with 2 GB memory Alpha 21264a (EV67) CPUs with 8 MB L2 cache Alpha 21264a (EV67) CPUs with 8 MB L2 cache Quadrics “fat tree” interconnect (5 usec latency, 150 MB/sec B/W)Quadrics “fat tree” interconnect (5 usec latency, 150 MB/sec B/W)

SGI Origin 3800SGI Origin 3800 SARA (1000 CPUs) - Numalink with R12k/400 CPUsSARA (1000 CPUs) - Numalink with R12k/400 CPUs

Commodity Systems (DL) Commodity Systems (DL) 32 X IA32 single processor CPUs (Pentium III/450), fast ethernet32 X IA32 single processor CPUs (Pentium III/450), fast ethernet Linux Alpha Cluster (16 X UP2000/667 - Quadrics Interconnect) Linux Alpha Cluster (16 X UP2000/667 - Quadrics Interconnect)

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

32

64

96

128

160

192

224

256

32 64 96 128 160 192 224 256

Number of Nodes

Sp

eed

-up

ZeoliteFragment

BasisAO/CD

Number ofNodes

Wall Timeto Solution

Si8O7H18 347/832 64 238s

Si8O25H18 617/1444 128 364s

Si26O37H36 1199/2818 256 1137s

Si28O67H30 1687/3928 256 2766s

Measured Parallel Efficiency for NWChem - DFT on IBM-SP; Wall Times to Solution for SCF Convergence

D.A Dixon et al., D.A Dixon et al., HPC, Plenum , 1999, p. 215HPC, Plenum , 1999, p. 215

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

209

7397

47

0

200

400

600

16 32 64 128

Pentium Beowulf IICray T3E/1200EAlpha Linux ClusterAlphaServer SC/667

530

194 198108

0

600

1200

1800

16 32 64 128

Pentium Beowulf IICray T3E/1200EAlpha Linux ClusterAlphaServer SC/667

DFT Coulomb Fit - NWChemDFT Coulomb Fit - NWChem

Number of CPUs Number of CPUs

Measured Time (seconds)

SiSi88OO77HH1818 347/832347/832 SiSi88OO2525HH1818 617/1444617/1444

Measured Time (seconds)

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

10800

2731

1382844

0

2000

4000

6000

8000

10000

12000

16 32 64 128

Cray T3E/1200EAlphaServer SC/667Alpha Linux Cluster

8309

17801301

602 887450

0

2000

4000

6000

8000

16 32 64 128

Cray T3E/1200E

Alpha Linux Cluster

AlphaServer SC/667

DFT Coulomb Fit - NWChemDFT Coulomb Fit - NWChem

Number of CPUs Number of CPUs

Measured Time (seconds) Measured Time (seconds)

SiSi2828OO6767HH3030 1687/39281687/3928SiSi2626OO3737HH3636 1199/28181199/2818

TTIBM-SP/P2SC-120 IBM-SP/P2SC-120 (256) = 1137(256) = 1137 TTIBM-SP/P2SC-120 IBM-SP/P2SC-120 (256) = 2766(256) = 2766

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

NWChem : NWChem : SiSi88OO77HH18 18 and Siand Si2626OO3737HH3636

0

50

100

150

200

250

300

350

8 16 32 64

Diag

ga_demm

DIIS

GetVxc

GtVcoul

FitCD

init g

3c-2e

S Diag

CDinv

Xcinv

0%

20%

40%

60%

80%

100%

8 16 32 64

Diag

ga_demm

DIIS

GetVxc

GtVcoul

FitCD

init g

3c-2e

S Diag

CDinv

Xcinv

0

500

1000

1500

2000

2500

3000

16 32 64

Diag

ga_demm

DIIS

GetVxc

GtVcoul

FitCD

init g

3c-2e

S Diag

CDinv

Xcinv

0%

20%

40%

60%

80%

100%

16 32 64

Diag

ga_demm

DIIS

GetVxc

GtVcoul

FitCD

init g

3c-2e

S Diag

CDinv

Xcinv

SiSi88OO77HH1818

SiSi2626OO3737HH3636

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

NWChem / NWChem / SiSi88OO2525HH1818 / Cycle / Cycle

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

NWChem / NWChem / SiSi88OO2525HH1818 / Diag / Diag

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

NWChem / NWChem / SiSi88OO2525HH1818 / subdiag / subdiag

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

NWChem / NWChem / SiSi88OO2525HH1818 / subdiag / subdiag

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

Parallel Implementations of GAMESS-UKParallel Implementations of GAMESS-UK

Extensive use of Global Array (GA) Tools and Parallel Extensive use of Global Array (GA) Tools and Parallel Linear Algebra from NWChem Project (EMSL)Linear Algebra from NWChem Project (EMSL)

SCF and DFT energies and gradientsSCF and DFT energies and gradients Replicated data, but …Replicated data, but … GA Tools for caching of I/O for restart and checkpoint filesGA Tools for caching of I/O for restart and checkpoint files Storage of 3-centre 2-e integrals in DFT Jfit Storage of 3-centre 2-e integrals in DFT Jfit Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix)Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix)

SCF second derivativesSCF second derivatives Distribution of <vvoo> and <vovo> integrals via GAsDistribution of <vvoo> and <vovo> integrals via GAs

MP2 gradientsMP2 gradients Distribution of <vvoo> and <vovo> integrals via GAsDistribution of <vvoo> and <vovo> integrals via GAs

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

GAMESS-UK: DFT S-VWN GAMESS-UK: DFT S-VWN Impact of Coulomb Fitting: Impact of Coulomb Fitting: Compaq AlphaServer SC /833 Compaq AlphaServer SC /833

Number of CPUs Number of CPUs

Measured Time (seconds) Measured Time (seconds)

Basis: DZV_A2 (Dgauss)Basis: DZV_A2 (Dgauss)A1_DFT Fit:A1_DFT Fit:

7128

1983

3785

1338

2172

9281365

751

0

2000

4000

6000

8000

16 32 64 128

JJEXPLICITEXPLICIT

JJFITFIT

3583

1100

1956

695

1113

484710

386

0

1000

2000

3000

4000

16 32 64 128

JJEXPLICITEXPLICIT

JJFITFIT

SiSi2626OO3737HH3636 1199/28181199/2818 SiSi2828OO6767HH3030 1687/39281687/3928

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

DFT Coulomb Fit - GAMESS-UKDFT Coulomb Fit - GAMESS-UK

Number of CPUs Number of CPUs

Measured Time (seconds) Measured Time (seconds)

SiSi2828OO6767HH3030 1687/39281687/3928SiSi2626OO3737HH3636 1199/28181199/2818

0

1000

2000

3000

4000

5000

16 32

Pentium Beowulf IICray T3E/1200EIBM SP/WH2-375Origin3800/R12k-400Alpha Linux ClusterAlphaServer SC/667AlphaServer SC/833

2393

3751

2041

1569

0

2000

4000

6000

8000

10000

16 32

Pentium Beowulf IICray T3E/1200EIBM SP/WH2-375SGI Origin3800/R12k-400Alpha Linux ClusterAlphaServer SC/667AlphaServer SC/833

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

718289

141

628

322

164

230194

202

0%

20%

40%

60%

80%

100%

16 32 64

DFT JFitDFT JFit Performance : Performance : Si26O37H36

Number of CPUs

JFit

XC

SCF

Cray T3E/1200E

SCF

XC

JFit

AlphaServer SC/833

1727685

266 179

1674834

458250

277259

293

0%

20%

40%

60%

80%

100%

16 32 64 128

436214

102 75

295

153

8043

303266

243213

0%

20%

40%

60%

80%

100%

16 32 64 128

SCF

XC

JFit

SGI Origin 3000/R12k-400

Number of CPUs

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

GAMESS-UK / GAMESS-UK / SiSi88OO2525HH1818 : 8 CPUs: : 8 CPUs:

One DFT CycleOne DFT Cycle

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

GAMESS-UK / GAMESS-UK / SiSi88OO2525HH1818 : 8 CPUs : 8 CPUs QQ††HQ HQ

(GAMULT2) and PEIGS(GAMULT2) and PEIGS

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

SummarySummary PNNL, Daresbury and PALLAS collaborationsPNNL, Daresbury and PALLAS collaborations Tool for Performance Analysis - VAMPIR & VAMPIR TraceTool for Performance Analysis - VAMPIR & VAMPIR Trace

Extended to handle GA ApplicationsExtended to handle GA Applications Applied in a number of DFT Calculations on Zeolite Fragments on Applied in a number of DFT Calculations on Zeolite Fragments on

a variety of high-end and commodity-based platformsa variety of high-end and commodity-based platforms Instrumentation of both NWChem and GAMESS-UK:Instrumentation of both NWChem and GAMESS-UK:

Distributed data (NWchem) Distributed data (NWchem) Replicated Data (GAMESS-UK)Replicated Data (GAMESS-UK) Analysis of GAs and PeIGsAnalysis of GAs and PeIGs

FindingsFindings non-intrusivenon-intrusive Tracing of substantial runs possibleTracing of substantial runs possible

Size of trace files in distributed data applicationsSize of trace files in distributed data applications

Use in quantifying scaling problemsUse in quantifying scaling problems e.g. GA_MULT2 in GAMESS-UKe.g. GA_MULT2 in GAMESS-UK

Performance analysis of GA applications 19th June 2001

Computational Science and Engineering Department Daresbury Laboratory

AcknowledgementsAcknowledgements

Bob GingoldBob GingoldAustralian National Univeristy Supercomputer FacilityAustralian National Univeristy Supercomputer Facility

Mario Deilmann, Mario Deilmann,

Hans Plum, Hans Plum,

Heinrich BockhorstHeinrich BockhorstPallasPallas