parallelization of cumulative reaction probabilities (crp) parallel subspace projection approximate...

• Parallelization of Cumulative Reaction Probabilities (CRP)

• Parallel Subspace Projection Approximate Matrix (SPAM) method

• Iterpolative Moving Least Squares (IMLS) method (see other poster)

Advanced Software for the Calculation of Thermochemistry, Kinetics, and DynamicsStephen Gray, Ron Shepard, Al Wagner, Mike Minkoff, Argonne National Laboratory

Theoretical Chemical Dynamics Studies for Elementary Combustion ReactionsDonald Thompson, Gia Maisuradze, Akio Kawano, Yin Guo, Oklahoma State University

CRP (Stephen Gray, Mike Minkoff, and Al Wagner)• computationally intensive core of reaction rate constants• mathematical kernel (all matrices are sparse with some structure):

- method 1: - iterative eigensolve (imbedded iterative linearsolves) - clever preconditioning important

- portability based on ANL PETSc library of kernels- method 2: - Chebyschev propagation (=> matrix vector multiplies) - novel finite difference representation (helps parallelize)

• programming issues:- parallelization - exploiting data structure (i.e., preconditioning)

SPAM (Ron Shepard and Mike Minkoff)• novel iterative method to solve general matrix equations

- eigensolve - linear solve - nonlinear solve

• applications are widespread - in chemistry: CRP, electronic structure (SCF, MRSDCI,…)

• mathematical kernel:- related to Davidson, multigrid, and conjugate gradient methods- subspace reduction (requiring usual matrix vector multiplies) - projection operator decomposition of matrix vector product - substitution of user-supplied approximate matrix in computationally intensive part of decomposition- sequence of approx. matrices => multilevel method

• programming issues:- generalization of approach (only done for eigensolve)- incorporation into libraries (connected to TOPS project part at ANL)- test of efficacy in realistic applications (e.g., CRP)

Parallelization of Cumulative Reaction Probabilities (CRP) Stephen Gray, Mike Minkoff, and Al Wagner (Argonne National Laboratory)

• Cumulative Reaction Probabilities (CRP)- computational core of reaction rate constants- exact computation computational intensive- approximate computation underlies all major reaction rate theories in use=> efficient exact CRP code will

- give exact rates (if the computed forces are accurate)

- calibrate ubiquitous approximate rate methods

• Two methods:– Time Independent (Miller and Manthe, 1994, and others)– Time Dependent (Zhang and Light, 1996, and others)

• Highly parallel approaches to both methods being pursued

Parallelization of Cumulative Reaction Probabilities Time Independent Approach

Mike Minkoff and Al Wagner

• N(E) = k pk(E,J)

where pk(E,J) = eigenvalues of Probability Operator:

P(E) = 4 r1/2 (H+i-E)-1p (H-i-E)-1r

1/2

where ix = absorbing diagonal potentials = imaginary potentials

H = hamiltonian (differential operator) => for realistic problems size ~105x105 or much larger number of eigenvalues < 100

• iterative approachmacrocycle of iteration for eigenvaluemicrocycle of iteration for action of Green’s function (H-i-E)-1 => linear solve

Action of

1st Green's Fcn on

current vector

Action of

2nd Green's Fcn on

current vector

Eigenvalue Cycle

Probability Method PRO: few largest eigenvalues -> usual iterative methods CON: iterative evaluation of action of 2 Green's function

Inverse Probability Method PRO: direct calculation each iteration CON: few smallest eigenvalues Direct

Calculation of

Inverse Operator

• Code built on PETSc (http://www.mcs.anl.gov/petsc) -> TOPS

• PETSc: data structure, GMRES linear solve, preconditioners USER: Lanczos method for eigensolve

• Future: user supplied preconditioners


Computation and Communication KernelsMPI, MPI-IO, BLAS, LAPACK

Profiling Interface

PETSc PDE Application Codes

Object-OrientedMatrices, Vectors, Indices

GridManagement

Linear SolversPreconditioners + Krylov Methods

Nonlinear Solvers,Unconstrained Minimization

ODE Integrators Visualization

Interface

http://www.mcs.anl.gov/petsc


Performance:

-model problem- Optional number of dimensions • Eckhart potential along rxn coord. • parabolic potential perpendicular to reaction coord.- DVR representation of H

-Algorithm options- Diagonal preconditioner

-Computers- NERSC SP- others include SGI Power Chanllenge, Cray T3E

Performance vs. processorsfor model with increasing dimensions

102

103

104

105

100 101 102

Tim

e-t

o-s

olu

tio

n/e

ige

nv

alu

e

(se

c)

Number of Processors

5D

7D

6D

7D

E = .32eV

E = .44eV


Future Preconditioners:

• Banded

• SPAM

• Sparse optimal similarity transforms (Poirier)

IF Q = THEN find optimal Q such that QHQT =

optimal block diagonal ->

Idealized H for 3D Problem: different colors => different dimension

effective band width: all dimensioned covered

10-3

10-2

10-1

100

101

102

103

100 101 102

Preconditioning with Banded Matrix

Tim

e/e

ige

nv

alu

e

rela

tiv

e t

o d

iag

on

al

pre

co

nd

itio

ne

r

Half Band Width

#m

icro

/ma

cro

2D

2D

3D

3D4D

4D

failed to converge

Other Preconditioners: SPAM with H(1) striped?

fat bandprecond.

SPAMprecond.

can SPAM get scalable performance?

Idealized H for 3D Problem: different colors => different dimension

10-2

10-1

100

101

102

103

104

105

106

3 4 5 6 7 8 9 10

TIm

e-t

o-s

oln

No. of Dimensions

banded preconditioner

diagonal preconditioner

Good performance bad scaling

Parallelization of Cumulative Reaction Probabilities Time Dependent Approach

Dimitry Medvedev and Stephen Gray

N(E) can be found from time dependent transition state wavepackets (TSWP)(see Zhang and Light, J. Chem. Phys. 104, 6184 (1996))

N(E) = M Ni(E) where Ni(E) Im<i(E)|Fi(E)>

where F = differential flux operator i(E) exp(iEt)i(x,t) dt

where TSWP i(x,t) from i/t i(x,t) = H i(x,t)

where H is Schroedinger Eq. Operator

Work• M different TSWPs (each independent of other)• each TSWP - propagated over time - Nt time steps - each time step propagation dominated by H i multiply => CPU work = M Nt (work of H i multiply)


Solution Strategy

• Real Wavepackets (TS-RWP)

(K. M. Forsythe and S. K. Gray, J. Chem. Phys. 112, 2623 (2000))

- half the storage, twice as fast relative to complex wavepackets

- Chebyshev iteration for propagation

- Hi work -> action of second order differential operators


Solution strategy (more)

• Dispersion Fitted Finite Difference (DFFD)

(S. K. Gray and E.M. Goldfield, J. Chem. Phys. 115, 8331 (2001))

- finite difference evaluation of action of differential operators

- optimized constants to reproduce dispersion relation

(dispersion related momentum to kinetic energy)

- different optimized constants

for selected propagation error

x.error for 3D H+H2 Reaction Probability vs. order of finite difference

- parallelize via decomposition of i in x

DFFD => less edge effects

inducing processor communications

10-5

0.0001

0.001

0.01

0.1

6 8 10 12 14 16 18 20 22

2n+1

Error

FD

DFFD


Solution strategy (more)

• Trivial parallelization over M wavepackets

• Nontrivial parallelization for wavepacket propagation

- propagation requires repeated actions of Hamiltonian matrix on a vector

- DFFD allows for facile cross-node parallelization

- mixed OpenMP/MPI Parallel program for AB+CD chemical reaction dynamics• wavefunction distributed over multiprocessor nodes according to the value of one of the coordinates

• One OpenMP thread performs internode communication • Remaining OpenMD threads do local work which dominates

• Future directions: - parallelize over three radial degrees of freedom - develop distributable version(s) of code for different parallel environments - investigate more general (e.g., Cartesian) representations.


Results

• OH+CO reaction

- 6 dimensions

- zero total ang. mom.

• Wavepacket propagated

reaction probability

• RWP + DFFD propagation

technique

• grid/basis dimensions > 109

• run at NERSC

Advanced Software for the Calculation of Thermochemistry, Kinetics & Dynamics

Subspace Projected Approximate Matrix: SPAMRon Shepard and Mike Minkoff (Argonne National Laboratory)

• New iterative method to solving general matrix equations– Eigenvalue– Linear– Non linear

• Method– Based on subspace reduction, projection operators, decomposition,

sequence of one or more approximate matrices

• Extensions– Demonstrated for symmetric real eigensolves– Demonstrations on linear and non-linear equations planned

• Applications– Applicable to problems with convergent

sequences of physical or numerical approximations– Parallelizable, multi-level library implementations via TOPS

Subspace Projected Approximate Matrix: SPAM Method

• Subspace iterative solution to eigenvalue, linear, and nonlinear problems

e.g. eigenvalue problem (H – j )vj = 0

• Subspace iterative solutions have form vj = Xn cj

where: Xn = {x1,x2,…,xn}

• cj is solved in a subspace: Hn cj = nj cj

where: Hn = (Wn)T Xn

where: Wn = H Xn <---for N>>n, where all the work is

• New xn+1 vector from residual: rn+1 = (Wn-nj Xn)cj

• SPAM gives more accurate or faster converging way to get xn+1 in 3 steps

Step 1: Assemble and apply Projections Operators on current vector

Pn xn+1 =Xn[Xn]T xn+1 Qn xn+1 = xn+1 - Pn xn+1

where n trial vectors are already processed, n+1 trial vector desired

Step 2: Decomposition and Approximation of H xn+1

• Decompose: H xn+1 = (Pn+Qn) H (Pn+Qn) xn+1

where (PnHPn+ PnHQn + QnHPn) xn+1 => cheap subspace operations (QnHQn) xn+1 => expensive full space operation

• Approximate (QnHQn) xn+1 by (QnH1Qn) xn+1 that is cheap to do

Step 3: Solve approximate subspace problem- solution is xn+1



SPAM properties:

• projection operators => convergence from any approx. matrix

• multi-level SPAM with dynamic tolerancesQnHQn approximated by Q(1)nH(1)Q(1)n

Q(1)nH(1)Q(1)n approximated by Q(2)nH(2)Q(2)n

Q(2)nH(2)Q(2)n approximated by Q(3)nH(3)Q(3)n…

• tens of lines of code added to existing iterative subspace eigensolvers

• highly parallelizable

• applicable to any subspace problem


SPAM properties (broad view):

• Relation to other subspace methods (e.g., Davidson)– More flexible (sequence of approx. matrices - no sequence => SPAM= Davidson)– If iteration tolerances are correct, always no worse than Davidson

• Relation to multigrid methods– SPAM sequences of approx. matrices ~ multigrid sequences of approx. grids– Subspace method => solution vector composed of multiple vectors

Multigrid methods => single solution vector that is updated

• Relation to preconditioned conjugate gradient (PCG) methods– SPAM has multiple vectors and approximations always improved by projection

PCG has a fixed single preconditioner and single vector improved by projection

• Deep injection of physical insight into numerics– Application experts can design physical approximation sequences– SPAM maps approximation sequences onto numerics sequences– Projection operators continually improve the approximations

=> coarse approximations can still be numerically useful

Subspace Projected Approximate Matrix: SPAMExtensions

Mathematical Extensions

• Eigensolves– Symmetric real or complex-hermitian

• Done with many applications

(http://chemistry.anl.gov/chem-dyn/Section-C-RonShepard.htm)• Code available

(ftp: ftp.tcg.anl.gov/pub/spam/{README,spam.tar.Z})– Generalized symmetric planned– General complex non-hermitian planned

• Linear solves planned

• Nonlinear solves planned

• Formal connection to multigrid and conjugate gradient methods in progress

Subspace Projected Approximate Matrix: SPAMApplications

broad view:

• Any iterative problem solved in a subspacewith a user-supplied cheap approximate matrix (TOPS connection)

• What is a cheap approximate matrix?• Sparser• Lower-order expansion of matrix elements• Coarser underlying grid • Lower-order difference equation• Tensor-product approximation • Operator approximation• More highly parallelized approximation• Lower precision• Smaller underlying basis

• Terascale Optimal PDE Siumlations (TOPS) connection– Basic parallelized multi-level code with user-supplied approx. matrix– Template approximate matrices stored in library


Model Applications

• Perturbed Tensor Product Model (4x4 tensor products + tridiag. perturbation)

- tensor product approximation

- 80% to 90% savings of SPAM over Davidson

• Truncated Operator Expansion Model


Two Chemistry Applications:

• Rational-Function Direct-SCF

- tensor product approximation to hessian matrix using Fock matrix elements


• MRSDCI

- Bk approximation


<kl|mn> <kl|mc> <kl|cd> <ka|cd> <ab|cd>

Number 104/8 105/2 106/4 107/2 108/8 %H used 2% 8% 80% 8% 2%

Bk approximation:

Structure of H: • H partitions (dotted lines) => mix of integral type • area of stylized pattern => frequency of integral type

<kl|mn> <kl|mc> <kl|cd> <ka|cd> <ab|cd>

Number 104/8 105/2 106/4 107/2 108/8 %H used 2% 8% 80% 8% 2%

approx. matrix exact matrix


Another Chemistry Application (in progress):Vibrational Wave Function Computation

• Based on the TetraVib program of H-G. Yu and J. Muckerman

[J. Molec. Spec. 214, 11-20 (2002)]

• Six choices of internal coordinates (3 angles and 3 radial coordinates)

• Angular degrees of freedom expanded in parity-adapted product basis Ylm(1, 1)Yl-m(2, 2)

• Radial coordinates are represented on a 3D DVR grid

• Allows arbitrary potential functions to be used

(loose modes, multiple minima, etc.)



• Approximate matrix:- single precision version of exact double precision matrix- lowest eigen value for H2CO- 10% to 30% savings

Time/seconds Davidson SPAMN m Hv H1v [n] Time [n,n1] Time

TSPAMTDavidson

11375 5 0.01 0.01 [12] 0.33[2,12] 0.33 1.00069972 7 0.08 0.06 [15] 3.31[2,15] 2.56 0.773

280665 9 0.29 0.25 [17] 15.90[2,16] 11.94 0.751865150 11 1.02 0.89 [24] 82.83[2,24] 59.45 0.718

2229955 13 2.73 2.39 [13] 99.91[2,13] 86.90 0.870

incr

easi

ng

b

asis

set

siz

e



• Approximate matrix:- smaller basis representation of large basis exact matrix - simple interpolates (care with DVR weights)- work in progress

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Common Component Architecture (CCA) Connection

In collaboration with CCTTSS ISIC, we have examined CCA:

• CCA allows inter-operability between a collection of codes - controls communication between programs and sub-programs - has script that allows assembly of full program at execution - requires up-front effort

• POTLIB is a library of potential energy surfaces (PES) - each PES book in library obeys an interface protocol - translators connect codes requiring PES to any book in library

• We intende to CCA POTLIB and some codes that use POTLIB (e.g., VariFlex, Venus, Trajectory program of D. Thompson - How much effort is required (being trained by CCTTSS people) - How robust is CCA software - How easy is it to have a GUI to organize CCA script

Future Work

• CRP: - time-independent:

- improve performance with new preconditioners- distribute the code

- time-dependent:- improve parallelization and distribute- develop Cartesian coordinate versions

• SPAM: - test generic approximate matrix strategies on application problems - extend to linear solves - distribute via TOPS

parallelization of cumulative reaction probabilities (crp) parallel subspace projection approximate...

Documents