filip pawl owski, simen reine, erik tellgren stinne høst...

SCF methods for energies and properties of large molecular systems

Trygve Helgaker, University of Oslo, Norway

Filip Paw lowski, Simen Reine, Erik Tellgren

Stinne Høst, Branislav Jansık, Poul Jørgensen, Pekka Manninen, Jeppe Olsen,

University of Aarhus, Denmark

Sonia Coriani, University of Trieste, Italy

Pawe l Sa lek, Royal Institute of Technology, Sweden

Electron Correlation for the Whole Periodic Table

– a theoretical symposium in honour of Bjorn Roos –

International Conference of Computational Methods in Sciences and Engineering

(ICCMSE 2006)

October 27–November 1, 2006

Hotel Panorama, Chania, Crete, Greece

1

Self-consistent field (SCF) theory for large systems

• The development of SCF methods for large systems presents several challenges:

– expensive and difficult Fock/Kohn–Sham evaluation

– expensive diagonalization and difficult SCF convergence

– DFT functionals may perform poorly for large systems

• Consider the optimization of the SCF energy (here LDA) of molecular systems:

– small systems

dominated by

KS-matrix evaluation,

with linear scaling

– large systems

dominated by SCF

diagonalization, with

cubic scaling 200 400 600 800 1000atoms

20

40

60

80

100

mins

SCF

Coulomb

LDA XC

• We shall here consider an SCF theory that avoids MOs and diagonalization

– it works directly in terms of the density matrix for energy and properties

– it involves only additions and multiplications of (sparse) one-electron matrices

– for large (sparse systems), the calculations scale linearly with system size

2

Direct optimization of the density matrix

• Consider the direct optimization of the density matrix:

E(D) = TrDh + 2-el. part

– there are constraints on the density matrix

D = DT| {z }

symmetry

, TrDS = N| {z }trace

, DSD = D| {z }idempotency

• Any other valid density matrix D(X) can then be generated from this matrix:

D(X) = exp(−XS)D exp(SX)| {z }exponential parameterization

, XT = −X| {z }antisymmetric

– Helgaker, Jørgensen and Olsen: Molecular Electronic-Structure Theory (Wiley, 2000)

– Head-Gordon and coworkers, MolPhys 101, 37 (2003), JCP 118, 6144 (2003)

• We can obtain any valid density matrix, in the AO basis, without recourse to MOs!

– in particular, we may optimize the energy by freely varying Xµν with µ > ν:

Emin(X) = minX

[TrD(X)h + 2-el. part]

• Is this use of D(X) a practical proposition?

3

Two questions about D(X) = exp(−XS)D exp(SX)

• Can it be evaluated efficiently?

– we use a generalized Baker–Campbell–Hausdorff (BCH) expansion:

D(X) = D +ˆD,X

˜S

+ 12

ˆˆD,X

˜S,X

˜S

+ · · ·

– we have here introduced the S commutatorˆD,X

˜S

= DSX−XSD

– converges rapidly (purification may be necessary), in about 10 matrix multiplications

• Are redundancies a problem?

– the AO space consists of two parts: the occupied space and the virtual space

P = DS (onto occupied space), Q = I−DS (onto virtual space)

– only rotations between the occupied and virtual spaces are nonredundant:

X = PXPT + QXQT

| {z }redundant

+ PXQT + QXPT

| {z }Xov

– to avoid problems with redundancies, we use the projected parameterization

D(X) = exp(−XovS)D exp(SXov), XT = −X

4

Diagonalization-free Roothaan–Hall SCF optimization

• The SCF (Fock or Kohn–Sham) energy may, in principle, be optimized directly:

Emin = minX

E(X) ⇔ F(D)DS = SDF(D)| {z }stationary condition

– a difficult global minimization problem!

• In MO theory, the Roothaan–Hall SCF scheme works well, especially with DIIS:

F = h + g(D)F⇄

DFC = SCǫ; Dnew = CoccC

Tocc

– each diagonalization is equivalent to minimizing the sum of the (occ.) orbital energies

ε(X) =X

I

ǫI = TrD(X)F

• By analogy with MO theory, we set up the following Roothaan–Hall SCF scheme:

F = h + g(D)F⇄

Dεmin = min

X

TrD(X)F; Dnew = D(X∗)

– at each SCF iteration, we minimize Tr D(X)F with respect to X

– the new density is then obtained by expansion of D(X) with the minimizer X∗

• We thus avoid MOs and diagonalization but retain the SCF iterations

5

Newton minimization of the Roothaan–Hall energy function

• At each SCF iteration, our task is to minimize the Roothaan–Hall energy function

ε(X) = TrD(X)F = TrDF + TrˆD,X

˜SF + 1

2Tr

ˆˆD,X

˜S,X

˜SF + · · ·

• Truncating at second order and setting the gradient to zero, we obtain the Newton step:

H−X+S− + S−X+H− = G− ← the Roothaan–Hall (RH) Newton equation

– where the (negative) gradient and Hessian matrices are given by

G = Fvo − Fov

H = Fvv − Foo F = Foo+Fov+Fvo+Fvv

• Because of their large dimensions, the Newton equations cannot be solved directly

– transformation to orthonormal Lowdin basis: A± = S±1/2AS±1/2

– S−1/2 precalculated: Z0 = I, Zn+1 = 12Zn (3I− λZnSZn) for ‖I− λS‖2 < 1

– solution by the preconditioned conjugate-gradient method (typically 10 iterations)

– elementary (sparse) matrix manipulations (typically less than 100 multiplications)

• A RH diagonalization corresponds to an exact minimization (many Newton steps)

– however, a partial minimization will do (one RH Newton step is sufficient)

6

Iterative solution of Roothaan–Hall Newton equations

• Logarithmic plots of the residual against the number of iterations

– H2O, LDA/t-aug-cc-pVTZ

– 99 alanine residue peptides, LDA/6-31G (5449 AOs)

5 10 15 20 25

-3

-2.5

-2

-1.5

-1

-0.5

H2OPCG Lowdin

H2OPCG Cholesky H2O

CG Lowdin

H2OCG Cholesky

alaninPCG Lowdin

7

CPU time spent in Roothaan–Hall Newton equations

• We have successfully avoided Fock/Kohn–Sham diagonalization

– minimization rather than the solution of a generalized eigenvalue problem

– rapidly convergent: 50–100 sparse matrix multiplications needed

• Linear scaling is obtained by employing sparse-matrix algebra

– compressed sparse-row (CSR) representation of few-atom blocks

200 400 600 800 1000 1200

500

1000

1500

2000

2500

time in RH Newton equations against number of atomsalanine residue peptidesHF�6-31G

sparsedens

8

SCF convergence in large molecules

• The SCF convergence is typically more difficult in larger systems

– small (or negative) HOMO–LUMO gaps and small Hessian eigenvalues in DFT

– 6-31G HOMO–LUMO gap and lowest Hessian eigenvalue in alanine residue peptides

100 150 200 250 300 3500

0.1

0.2

0.3

0.4alanine residue peptides

HF HOMO-LUMO gap

B3LYP HOMO-LUMO gap

lowest HF Hessian eigenvalue

B3LYP eigenvalue

• Trust-region SCF (TRSCF): modified SCF scheme, for more robust convergence

– step-size control in trust-region RH step by level shifting F− µSoo

– density averaging with energy lowering by density-subspace minimization (DSM)

9

Illustration: alanine residue peptides

• Features of the code

– diagonalization-free trust-region Roothaan–Hall (TRRH) energy minimization

– trust-region density-subspace minimization (TRDSM) for density averaging

– boxed density-fitting with FMM for Coulomb evaluation

– LinK for exact exchange, linear-scaling exchange-correlation evaluation

– compressed sparse-row (CSR) representation of few-atom blocks

• alanine residue peptides

– CPU time against atoms

– HF/6-31G

– 5th SCF iteration

– dominated by exchange

– RH step least expensive

– full lines: sparse algebra

– dashed lines: dens algebra100 200 300 400 500 600

2500

5000

7500

10000

12500

15000

exchange

Coulomb

DSM

RH

10

Alanine residue peptides: timings in first local SCF iteration (HF/6-31G)

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0 200 400 600 800 1000 1200

CP

U ti

me

/ sec

onds

Atoms

Solution of the Newton equationsFock JFock X

DSM

11

Alanine residue peptides: dense vs. sparse matrix algebra (HF/6-31G)

0

5000

10000

15000

20000

25000

30000

35000

40000

0 200 400 600 800 1000 1200

CP

U ti

me

/ sec

onds

Atoms

Hartree-Fock: first local SCF iteration

Solution of the Newton equations (Dense)Solution of the Newton equations (Sparse)

DSM (Dense)DSM (Sparse)

12

Response theory

• The expectation value of A in the presence of a perturbation Vω of frequency ω:

˙t˛A

˛t¸

=˙0

˛A

˛0

¸+

Z〈〈A; Vω〉〉ω exp (−iωt) dω + · · ·

– the linear-response function 〈〈A; Vω〉〉ω carries information about the first-order

change in the expectation value

• The linear-response function may be represented compactly as:

〈〈A; V ω〉〉ω = −A[1]TÈ[2] − ωS[2]

´−1V

[1]ω| {z }

linear equations

←

8><>:

E[2] electronic Hessian

S[2] metric matrix

A[1] = vec

`

ADS − SDA´

• In practice, the response functions are evaluated by solving a set of linear equationsÈ[2] − ωS[2]

Ń[1] = −V

[1]ω

〈〈A; V ω〉〉ω = A[1]TN[1]

• Excitation energies as poles of linear response function (RPA):È[2] − ωS[2]

´X = 0

• Can these tasks be accomplished efficiently in the AO basis?

13

Solution of the response equations

• The response equations are solved in the same manner as the RH Newton equations:`E[2] − ωS[2]

´x = V[1]

– generation of an iterative subspace until the residual is sufficiently small

R =`E[2] − ωS[2]

´x−V[1]

• Key step: multiplication of Hessian and metric matrices with trial vectors (matrices)

E[2](X) = (Fvv − Foo)XS + SX(Fvv − Foo) + gvo([D,X]S)− gov([D,X]S)

S[2](X) = SooXSvv − SvvXSoo

– requires recalculation of Fock/Kohn–Sham matrix with modified AO density matrix

– trial matrices always added in pairs X and XT

• For rapid convergence, the residual vector is preconditioned

eR = M−1R, M = E[2] − ωS[2] (but without red part!)

– transformation to orthogonal basis (Cholesky or Lowdin)

– nondiagonal preconditioning requires typically 7 conjugate-gradient steps

• With this preconditioner, the response equations converge in 5–10 iterations

– indeed, this is the same convergence as in the canonical MO basis

14

Alanine residue peptides: polarizability at ω = 0.1 (one iteration HF/6-31G)

0

10000

20000

30000

40000

50000

60000

0 200 400 600 800 1000 1200

CP

U ti

me

/ sec

onds

Atoms

Timings for different parts of response iteration (sparse)

Fock J Fock X

Lintra Precond

15

Polarizabilities of linear alkanes and alkenes

• To illustrate, we have calculated longitudinal polarizabilities in linear polymeric chains

– HF and DFT α and α/N in 6-31G basis, plotted against the number of carbons N

100 200 300

40 K

80 K

120 Kalkenes Α

100 200 300

100

200

300LDA

BLYP

B3LYP

CAMHF

CAM+

alkenesΑ�N

100 200 300

2 K

4 K

6 K

alkanes Α

100 200 300

5

10

15

alkanes Α�N

• The alkenes are about an order of magnitude more polarizable than the alkanes

– all models agree on alkanes (α/N -limit: HF 14.4; LDA 16.3)

– widely different results for alkenes (α/N -limit: HF 97; LDA 427)

16

The importance of exact exchange for longitudinal polarizabilities

• Without a good description of long-range exchange, the systems become too polarizable

50 100 150 200 250 300 350

100

200

300

400

500

600

LDA H0%L

BLYP H0%L

B3LYP H20%L

CAM-B3LYP H19%®65%L

HF H100%L CAM-B3LYP+ H19%®100%L

Hartree H0%L

– the Hartree model neglects all exchange and overestimates by a factor of eight

– pure DFT has a poor long-range exchange and overestimates by a factor of four

– hybrid functionals improve the situation, introducing some exact exchange

– compromise solution: standard DFT at short range, full exchange at long range

17

Asymptotic behaviour of group polarizabilities

• How does the group polarizability converge towards the infinite limit?

α∞ − αN = eN−1 +O(N−2) Kudin et al., JCP 122, 134907 (2005)

– this behaviour is universal, holding at all levels of theory

• Log–log plots of α∞ − αN for alkanes and alkenes:

2 10 60 350

0.1

1

10

2 10 60 3501

10

100

– limit obtained by extrapolation α∞ = (αN − αM )/(N −M)

– straight lines of slope −1 superimposed through the points at N = 350

• The asymptotic region is reached with C30H62 (alkanes) and C60H62 (alkenes)

– alkane α∞ predicted to within 1% from C30H62

– alkene α∞ predicted to within 1% from C60H62 for HF and from C150H152 for LDA

18

Excitation energies

• The calculation of excitation energies (RPA) more difficult than polarizabilities

`E[2] − ωS[2]

´X = 0

– slow convergence of preconditioning equations (subspace problem nearly singular)

– good starting vectors needed (trivial in MO basis)

– calculate highest occupied and lowest virtual Fock/Kohn–Sham eigenvectors

0

5000

10000

15000

20000

25000

0 200 400 600 800 1000 1200

CP

U ti

me

/ sec

onds

Atoms

Kohn-Sham J Kohn-Sham X

Kohn-Sham XCLintra

Precond

19

CAMB3LYP/6-31G excitation energies of alanine residue peptides

0.085

0.09

0.095

0.1

0.105

0.11

0.115

0.12

0.125

0.13

200 400 600 800 1000 1200

Ene

rgy

/ Har

tree

Atoms

HOMO-LUMO gapLowest Hessian eigenvalue

Lowest excitation energy

20

Sparsity of linear alkanes and alkenes

• Each energy optimization was converged in 6–14 SCF iterations

– about 70 matrix multiplications for each TRRH step (diagonalization)

– about 50 matrix multiplications for each TRDMS step (DIIS)

• Each polarizability component required 3–4 response iterations

– about 20 matrix multiplications in each iteration

• Percentage of matrix elements greater than 10−6 in alkane and alkene chains

– overlap matrix:

- sparse

– density matrix:

- nonsparse for alkenes

- sparse for alkanes

– Fock/KS matrices:

- KS matrix like overlap

- Fock matrix

intermediate

between overlap and

density matrices

100 200 300

50

100LDA alkenes

100 200 300

50

100HF alkenes

100 200 300

50

100LDA alkanes

100 200 300

50

100HF alkanes

21

Conclusions

• We have discussed the optimization of SCF energies without orbital generation

– direct optimization of density matrix in the AO basis

– in each SCF iteration, we replace diagonalization by minimization

– one Newton step is usually enough

– 50–100 matrix multiplications typically required

– linear scaling achieved with sparse matrix algebra

• Large molecules represent a more difficult minimization problem

– small Hessian eigenvalues for pure DFT

– trust-region SCF: careful step-size control

– revert to second-order if necessary

• Linear-response calculations are straightforward in the AO basis

– one Fock/Kohn–Sham matrix build and 20 matrix multiplications pr. iteration

– stable convergence in 5–10 iterations

– preconditioning requires typically 7 iterations

• The calculation of excitation energies more difficult

– near singular subspace problem for preconditioning

– good starting vectors needed

22

filip pawl owski, simen reine, erik tellgren stinne høst...

Documents