parallelization of cumulative reaction probabilities (crp) parallel subspace projection approximate...
TRANSCRIPT
• Parallelization of Cumulative Reaction Probabilities (CRP)
• Parallel Subspace Projection Approximate Matrix (SPAM) method
• Iterpolative Moving Least Squares (IMLS) method (see other poster)
Advanced Software for the Calculation of Thermochemistry, Kinetics, and DynamicsStephen Gray, Ron Shepard, Al Wagner, Mike Minkoff, Argonne National Laboratory
Theoretical Chemical Dynamics Studies for Elementary Combustion ReactionsDonald Thompson, Gia Maisuradze, Akio Kawano, Yin Guo, Oklahoma State University
CRP (Stephen Gray, Mike Minkoff, and Al Wagner)• computationally intensive core of reaction rate constants• mathematical kernel (all matrices are sparse with some structure):
- method 1: - iterative eigensolve (imbedded iterative linearsolves) - clever preconditioning important
- portability based on ANL PETSc library of kernels- method 2: - Chebyschev propagation (=> matrix vector multiplies) - novel finite difference representation (helps parallelize)
• programming issues:- parallelization - exploiting data structure (i.e., preconditioning)
SPAM (Ron Shepard and Mike Minkoff)• novel iterative method to solve general matrix equations
- eigensolve - linear solve - nonlinear solve
• applications are widespread - in chemistry: CRP, electronic structure (SCF, MRSDCI,…)
• mathematical kernel:- related to Davidson, multigrid, and conjugate gradient methods- subspace reduction (requiring usual matrix vector multiplies) - projection operator decomposition of matrix vector product - substitution of user-supplied approximate matrix in computationally intensive part of decomposition- sequence of approx. matrices => multilevel method
• programming issues:- generalization of approach (only done for eigensolve)- incorporation into libraries (connected to TOPS project part at ANL)- test of efficacy in realistic applications (e.g., CRP)
Parallelization of Cumulative Reaction Probabilities (CRP) Stephen Gray, Mike Minkoff, and Al Wagner (Argonne National Laboratory)
• Cumulative Reaction Probabilities (CRP)- computational core of reaction rate constants- exact computation computational intensive- approximate computation underlies all major reaction rate theories in use=> efficient exact CRP code will
- give exact rates (if the computed forces are accurate)
- calibrate ubiquitous approximate rate methods
• Two methods:– Time Independent (Miller and Manthe, 1994, and others)– Time Dependent (Zhang and Light, 1996, and others)
• Highly parallel approaches to both methods being pursued
Parallelization of Cumulative Reaction Probabilities Time Independent Approach
Mike Minkoff and Al Wagner
• N(E) = k pk(E,J)
where pk(E,J) = eigenvalues of Probability Operator:
P(E) = 4 r1/2 (H+i-E)-1p (H-i-E)-1r
1/2
where ix = absorbing diagonal potentials = imaginary potentials
H = hamiltonian (differential operator) => for realistic problems size ~105x105 or much larger number of eigenvalues < 100
• iterative approachmacrocycle of iteration for eigenvaluemicrocycle of iteration for action of Green’s function (H-i-E)-1 => linear solve
Action of
1st Green's Fcn on
current vector
Action of
2nd Green's Fcn on
current vector
Eigenvalue Cycle
Probability Method PRO: few largest eigenvalues -> usual iterative methods CON: iterative evaluation of action of 2 Green's function
Inverse Probability Method PRO: direct calculation each iteration CON: few smallest eigenvalues Direct
Calculation of
Inverse Operator
• Code built on PETSc (http://www.mcs.anl.gov/petsc) -> TOPS
• PETSc: data structure, GMRES linear solve, preconditioners USER: Lanczos method for eigensolve
• Future: user supplied preconditioners
Parallelization of Cumulative Reaction Probabilities Time Independent Approach
Computation and Communication KernelsMPI, MPI-IO, BLAS, LAPACK
Profiling Interface
PETSc PDE Application Codes
Object-OrientedMatrices, Vectors, Indices
GridManagement
Linear SolversPreconditioners + Krylov Methods
Nonlinear Solvers,Unconstrained Minimization
ODE Integrators Visualization
Interface
Parallelization of Cumulative Reaction Probabilities Time Independent Approach
Performance:
-model problem- Optional number of dimensions • Eckhart potential along rxn coord. • parabolic potential perpendicular to reaction coord.- DVR representation of H
-Algorithm options- Diagonal preconditioner
-Computers- NERSC SP- others include SGI Power Chanllenge, Cray T3E
Performance vs. processorsfor model with increasing dimensions
102
103
104
105
100 101 102
Tim
e-t
o-s
olu
tio
n/e
ige
nv
alu
e
(se
c)
Number of Processors
5D
7D
6D
7D
E = .32eV
E = .44eV
Parallelization of Cumulative Reaction Probabilities Time Independent Approach
Future Preconditioners:
• Banded
• SPAM
• Sparse optimal similarity transforms (Poirier)
IF Q = THEN find optimal Q such that QHQT =
optimal block diagonal ->
Idealized H for 3D Problem: different colors => different dimension
effective band width: all dimensioned covered
10-3
10-2
10-1
100
101
102
103
100 101 102
Preconditioning with Banded Matrix
Tim
e/e
ige
nv
alu
e
rela
tiv
e t
o d
iag
on
al
pre
co
nd
itio
ne
r
Half Band Width
#m
icro
/ma
cro
2D
2D
3D
3D4D
4D
failed to converge
Other Preconditioners: SPAM with H(1) striped?
fat bandprecond.
SPAMprecond.
can SPAM get scalable performance?
Idealized H for 3D Problem: different colors => different dimension
10-2
10-1
100
101
102
103
104
105
106
3 4 5 6 7 8 9 10
TIm
e-t
o-s
oln
No. of Dimensions
banded preconditioner
diagonal preconditioner
Good performance bad scaling
Parallelization of Cumulative Reaction Probabilities Time Dependent Approach
Dimitry Medvedev and Stephen Gray
N(E) can be found from time dependent transition state wavepackets (TSWP)(see Zhang and Light, J. Chem. Phys. 104, 6184 (1996))
N(E) = M Ni(E) where Ni(E) Im<i(E)|Fi(E)>
where F = differential flux operator i(E) exp(iEt)i(x,t) dt
where TSWP i(x,t) from i/t i(x,t) = H i(x,t)
where H is Schroedinger Eq. Operator
Work• M different TSWPs (each independent of other)• each TSWP - propagated over time - Nt time steps - each time step propagation dominated by H i multiply => CPU work = M Nt (work of H i multiply)
Parallelization of Cumulative Reaction Probabilities Time Dependent Approach
Solution Strategy
• Real Wavepackets (TS-RWP)
(K. M. Forsythe and S. K. Gray, J. Chem. Phys. 112, 2623 (2000))
- half the storage, twice as fast relative to complex wavepackets
- Chebyshev iteration for propagation
- Hi work -> action of second order differential operators
Parallelization of Cumulative Reaction Probabilities Time Dependent Approach
Solution strategy (more)
• Dispersion Fitted Finite Difference (DFFD)
(S. K. Gray and E.M. Goldfield, J. Chem. Phys. 115, 8331 (2001))
- finite difference evaluation of action of differential operators
- optimized constants to reproduce dispersion relation
(dispersion related momentum to kinetic energy)
- different optimized constants
for selected propagation error
x.error for 3D H+H2 Reaction Probability vs. order of finite difference
- parallelize via decomposition of i in x
DFFD => less edge effects
inducing processor communications
10-5
0.0001
0.001
0.01
0.1
6 8 10 12 14 16 18 20 22
2n+1
Error
FD
DFFD
Parallelization of Cumulative Reaction Probabilities Time Dependent Approach
Solution strategy (more)
• Trivial parallelization over M wavepackets
• Nontrivial parallelization for wavepacket propagation
- propagation requires repeated actions of Hamiltonian matrix on a vector
- DFFD allows for facile cross-node parallelization
- mixed OpenMP/MPI Parallel program for AB+CD chemical reaction dynamics• wavefunction distributed over multiprocessor nodes according to the value of one of the coordinates
• One OpenMP thread performs internode communication • Remaining OpenMD threads do local work which dominates
• Future directions: - parallelize over three radial degrees of freedom - develop distributable version(s) of code for different parallel environments - investigate more general (e.g., Cartesian) representations.
Parallelization of Cumulative Reaction Probabilities Time Dependent Approach
Results
• OH+CO reaction
- 6 dimensions
- zero total ang. mom.
• Wavepacket propagated
reaction probability
• RWP + DFFD propagation
technique
• grid/basis dimensions > 109
• run at NERSC
Advanced Software for the Calculation of Thermochemistry, Kinetics & Dynamics
Subspace Projected Approximate Matrix: SPAMRon Shepard and Mike Minkoff (Argonne National Laboratory)
• New iterative method to solving general matrix equations– Eigenvalue– Linear– Non linear
• Method– Based on subspace reduction, projection operators, decomposition,
sequence of one or more approximate matrices
• Extensions– Demonstrated for symmetric real eigensolves– Demonstrations on linear and non-linear equations planned
• Applications– Applicable to problems with convergent
sequences of physical or numerical approximations– Parallelizable, multi-level library implementations via TOPS
Subspace Projected Approximate Matrix: SPAM Method
• Subspace iterative solution to eigenvalue, linear, and nonlinear problems
e.g. eigenvalue problem (H – j )vj = 0
• Subspace iterative solutions have form vj = Xn cj
where: Xn = {x1,x2,…,xn}
• cj is solved in a subspace: Hn cj = nj cj
where: Hn = (Wn)T Xn
where: Wn = H Xn <---for N>>n, where all the work is
• New xn+1 vector from residual: rn+1 = (Wn-nj Xn)cj
• SPAM gives more accurate or faster converging way to get xn+1 in 3 steps
Step 1: Assemble and apply Projections Operators on current vector
Pn xn+1 =Xn[Xn]T xn+1 Qn xn+1 = xn+1 - Pn xn+1
where n trial vectors are already processed, n+1 trial vector desired
Step 2: Decomposition and Approximation of H xn+1
• Decompose: H xn+1 = (Pn+Qn) H (Pn+Qn) xn+1
where (PnHPn+ PnHQn + QnHPn) xn+1 => cheap subspace operations (QnHQn) xn+1 => expensive full space operation
• Approximate (QnHQn) xn+1 by (QnH1Qn) xn+1 that is cheap to do
Step 3: Solve approximate subspace problem- solution is xn+1
Subspace Projected Approximate Matrix: SPAM Method
Subspace Projected Approximate Matrix: SPAM Method
SPAM properties:
• projection operators => convergence from any approx. matrix
• multi-level SPAM with dynamic tolerancesQnHQn approximated by Q(1)nH(1)Q(1)n
Q(1)nH(1)Q(1)n approximated by Q(2)nH(2)Q(2)n
Q(2)nH(2)Q(2)n approximated by Q(3)nH(3)Q(3)n…
• tens of lines of code added to existing iterative subspace eigensolvers
• highly parallelizable
• applicable to any subspace problem
Subspace Projected Approximate Matrix: SPAM Method
SPAM properties (broad view):
• Relation to other subspace methods (e.g., Davidson)– More flexible (sequence of approx. matrices - no sequence => SPAM= Davidson)– If iteration tolerances are correct, always no worse than Davidson
• Relation to multigrid methods– SPAM sequences of approx. matrices ~ multigrid sequences of approx. grids– Subspace method => solution vector composed of multiple vectors
Multigrid methods => single solution vector that is updated
• Relation to preconditioned conjugate gradient (PCG) methods– SPAM has multiple vectors and approximations always improved by projection
PCG has a fixed single preconditioner and single vector improved by projection
• Deep injection of physical insight into numerics– Application experts can design physical approximation sequences– SPAM maps approximation sequences onto numerics sequences– Projection operators continually improve the approximations
=> coarse approximations can still be numerically useful
Subspace Projected Approximate Matrix: SPAMExtensions
Mathematical Extensions
• Eigensolves– Symmetric real or complex-hermitian
• Done with many applications
(http://chemistry.anl.gov/chem-dyn/Section-C-RonShepard.htm)• Code available
(ftp: ftp.tcg.anl.gov/pub/spam/{README,spam.tar.Z})– Generalized symmetric planned– General complex non-hermitian planned
• Linear solves planned
• Nonlinear solves planned
• Formal connection to multigrid and conjugate gradient methods in progress
Subspace Projected Approximate Matrix: SPAMApplications
broad view:
• Any iterative problem solved in a subspacewith a user-supplied cheap approximate matrix (TOPS connection)
• What is a cheap approximate matrix?• Sparser• Lower-order expansion of matrix elements• Coarser underlying grid • Lower-order difference equation• Tensor-product approximation • Operator approximation• More highly parallelized approximation• Lower precision• Smaller underlying basis
• Terascale Optimal PDE Siumlations (TOPS) connection– Basic parallelized multi-level code with user-supplied approx. matrix– Template approximate matrices stored in library
Subspace Projected Approximate Matrix: SPAMApplications
Model Applications
• Perturbed Tensor Product Model (4x4 tensor products + tridiag. perturbation)
- tensor product approximation
- 80% to 90% savings of SPAM over Davidson
• Truncated Operator Expansion Model
- 50% to 75% savings of SPAM over Davidson
Two Chemistry Applications:
• Rational-Function Direct-SCF
- tensor product approximation to hessian matrix using Fock matrix elements
- 10% to 70% savings of SPAM over Davidson
• MRSDCI
- Bk approximation
- 10% to 20% savings of SPAM over Davidson
<kl|mn> <kl|mc> <kl|cd> <ka|cd> <ab|cd>
Number 104/8 105/2 106/4 107/2 108/8 %H used 2% 8% 80% 8% 2%
Bk approximation:
Structure of H: • H partitions (dotted lines) => mix of integral type • area of stylized pattern => frequency of integral type
<kl|mn> <kl|mc> <kl|cd> <ka|cd> <ab|cd>
Number 104/8 105/2 106/4 107/2 108/8 %H used 2% 8% 80% 8% 2%
approx. matrix exact matrix
Subspace Projected Approximate Matrix: SPAMApplications
Another Chemistry Application (in progress):Vibrational Wave Function Computation
• Based on the TetraVib program of H-G. Yu and J. Muckerman
[J. Molec. Spec. 214, 11-20 (2002)]
• Six choices of internal coordinates (3 angles and 3 radial coordinates)
• Angular degrees of freedom expanded in parity-adapted product basis Ylm(1, 1)Yl-m(2, 2)
• Radial coordinates are represented on a 3D DVR grid
• Allows arbitrary potential functions to be used
(loose modes, multiple minima, etc.)
Subspace Projected Approximate Matrix: SPAMApplications
Another Chemistry Application (in progress):Vibrational Wave Function Computation
• Approximate matrix:- single precision version of exact double precision matrix- lowest eigen value for H2CO- 10% to 30% savings
Time/seconds Davidson SPAMN m Hv H1v [n] Time [n,n1] Time
TSPAMTDavidson
11375 5 0.01 0.01 [12] 0.33[2,12] 0.33 1.00069972 7 0.08 0.06 [15] 3.31[2,15] 2.56 0.773
280665 9 0.29 0.25 [17] 15.90[2,16] 11.94 0.751865150 11 1.02 0.89 [24] 82.83[2,24] 59.45 0.718
2229955 13 2.73 2.39 [13] 99.91[2,13] 86.90 0.870
incr
easi
ng
b
asis
set
siz
e
Subspace Projected Approximate Matrix: SPAMApplications
Another Chemistry Application (in progress):Vibrational Wave Function Computation
• Approximate matrix:- smaller basis representation of large basis exact matrix - simple interpolates (care with DVR weights)- work in progress
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Common Component Architecture (CCA) Connection
In collaboration with CCTTSS ISIC, we have examined CCA:
• CCA allows inter-operability between a collection of codes - controls communication between programs and sub-programs - has script that allows assembly of full program at execution - requires up-front effort
• POTLIB is a library of potential energy surfaces (PES) - each PES book in library obeys an interface protocol - translators connect codes requiring PES to any book in library
• We intende to CCA POTLIB and some codes that use POTLIB (e.g., VariFlex, Venus, Trajectory program of D. Thompson - How much effort is required (being trained by CCTTSS people) - How robust is CCA software - How easy is it to have a GUI to organize CCA script
Future Work
• CRP: - time-independent:
- improve performance with new preconditioners- distribute the code
- time-dependent:- improve parallelization and distribute- develop Cartesian coordinate versions
• SPAM: - test generic approximate matrix strategies on application problems - extend to linear solves - distribute via TOPS