journal of computational physicsyoksis.bilkent.edu.tr/pdf/files/14808.pdfjournal of computational...

Journal of Computational Physics 410 (2020) 109364

Contents lists available at ScienceDirect

Journal of Computational Physics

www.elsevier.com/locate/jcp

NURBS-based non-periodic finite element framework for Kohn-Sham density functional theory calculations

İ. Temizer a,∗, P. Motamarri b,1, V. Gavini b,ca Department of Mechanical Engineering, Bilkent University, 06800 Ankara, Turkeyb Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USAc Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109, USA

a r t i c l e i n f o a b s t r a c t

Article history:Received 18 September 2019Received in revised form 30 December 2019Accepted 25 February 2020Available online 28 February 2020

Keywords:Density functional theoryReal spaceFinite elementsB-splinesAll-electronPseudopotential

A real-space non-periodic computational framework is developed for Kohn-Sham density functional theory (DFT). The electronic structure calculation framework is based on the finite element method (FEM) where the underlying basis is chosen as non-uniform rational B-splines (NURBS) which display continuous higher-order derivatives. The framework is formulated within a unified presentation that can simultaneously address both all-electron and pseudopotential settings in radial and three-dimensional cases. The canonical Kohn-Sham equation and the Poisson equation are discretized on different meshes in order to ensure that the underlying variational structural of Kohn-Sham DFT is preserved within the weak formulation of FEM. The discrete generalized eigenvalue problem emanating from the Kohn-Sham equation is solved efficiently based on the Chebyshev-filtered subspace iteration method. Numerical investigations in the radial case demonstrate all-electron and local pseudopotential capabilities on single atoms. In the three-dimensional case, all-electron and nonlocal pseudopotential computations on single atoms and small molecules are followed by local and nonlocal pseudopotential studies on larger systems. At all stages, special care is taken to demonstrate optimal convergence rates towards the ground state energy with chemical accuracy. Comparisons with classical Lagrange basis sets indicate the significantly higher per-degree-of-freedom accuracy displayed by NURBS. Specifically, cubic NURBS discretizations can offer a faster route to a prescribed accuracy than even sixth-order Lagrange discretizations on comparable meshes, thereby indicating considerable efficiency gains which are possible with these higher-order basis sets within effective numerical implementations.

© 2020 Elsevier Inc. All rights reserved.

1. Introduction

Electronic structure calculation via ab initio methods can deliver the total energy of a material system based on its quantum mechanical description, thereby giving access to nearly all of its physical properties [1]. The Hartree-Fock theory [2], along with related more advanced methods [3], and the density functional theory (DFT) in the Kohn-Sham formulation [4], possibly enriched with Hartree-Fock ingredients such as through the exchange functional [5], have been particularly successful towards this purpose. The ground state energy, as the particular focus of this study, can be exactly predicted

* Corresponding author.E-mail address: [email protected] (İ. Temizer).

1 Present address: Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, 560012 India.

https://doi.org/10.1016/j.jcp.2020.1093640021-9991/© 2020 Elsevier Inc. All rights reserved.

https://doi.org/10.1016/j.jcp.2020.109364http://www.ScienceDirect.com/http://www.elsevier.com/locate/jcphttp://crossmark.crossref.org/dialog/?doi=10.1016/j.jcp.2020.109364&domain=pdfmailto:[email protected]://doi.org/10.1016/j.jcp.2020.109364

2 İ. Temizer et al. / Journal of Computational Physics 410 (2020) 109364

by both sets of approaches in principle. Based on the foundation provided by Hartree-Fock theory, exact total energy can be approached through full configuration interaction, albeit at an insurmountable computational cost. Kohn-Sham DFT, on the other hand, has an in-built exact energy expression, albeit through a hitherto unknown universal exchange-correlation functional. Both theories, however, have been shown to display predictive capabilities by delivering physically accurate energy differences in approximate but practical forms that have found widespread use in electronic structure calculation.

Irrespective of the theory chosen, a basis set is introduced within the accompanying numerical framework and classical choices are partially dictated by whether the problem is periodic or non-periodic as well as whether an all-electron or a pseudopotential setting is pursued [6]. Gaussian basis sets can be judiciously constructed to effectively address non-periodic all-electron chemistry problems involving molecules in vacuum [2,3,5], but their application to crystal analysis is not as widespread [7]. Plane waves, on the other hand, can naturally resolve such periodic physics problems in the pseudopotential setting where smoothly varying solutions are expected [8,7]. Despite the outstanding success of these methods in their domains of competence, they have some notable disadvantages. Gaussian basis sets are localized to provide a tailored description of the orbitals in an all-electron setting, allowing all manipulations to be carried out efficiently in real space, thereby rapidly achieving remarkable accuracies with a small number of degrees of freedom [3]. However, the extended span of the basis functions leads to systems of equations that are associated with dense matrices which become increasingly costlier to construct and solve, in a fashion that scales unfavorably with system size [5]. Moreover, even in the variational framework of DFT, improvability of the result through monotonic reduction in the total energy with increasing basis size is difficult to achieve due to the lack of a means for the systematic refinement of the incomplete basis set [9]. Such systematic improvability is elegantly achieved with plane waves through the energy cut-off which, therefore, preserves the underlying variational structure of DFT in the numerical setting. However, not only do they also lead to dense matrices but for efficient manipulations they additionally require transformations into the reciprocal space, a step that is considered unfavorable for parallelization [10]. Moreover, unlike Gaussian basis sets, plane waves do not allow local refinement of the basis and, as such, provide an inefficient resolution of vacuum in the treatment of isolated systems where large domains are needed to eliminate spurious effects arising from periodic boundary conditions [1,7].

Ultimately, a combination of the underlying theory, the basis set and complementary algorithmic aspects is desired such that a numerical framework that scales favorably, ideally at most linearly, with system size is obtained [11]. The choice of the basis set is central to achieving such a framework and the disadvantages of the traditional basis sets noted earlier have been reiterated in a number of works that have established the finite element method (FEM) in the context of Kohn-Sham DFT as an outstanding candidate [12–14]. FEM is based on basis functions with local support, leading to sparse matrices, and allows all calculations to be carried out in real space, thus avoiding transformations into reciprocal space. In the context of Kohn-Sham DFT, expensive treatment of the exchange operator is avoided, although FEM formulations also have been [15] and continue to be [16] developed in the context of Hartree-Fock theory where this treatment is addressed. Contrary to an alternative real-space approach such as the finite difference method, which actually does not employ a basis but also delivers sparse matrices [17,6,18], FEM additionally allows local discretization refinement. Local solution quality improvement is possible through adaptive coordinate transformations in the context of planes waves [19], finite elements [20,21] and finite differences [10] (which is also amenable to nested discretizations [6]) but improvement through local discretization refinement is naturally embedded only in FEM — a feature that is particularly important in the all-electron setting. Moreover, FEM preserves the variational structure of DFT, unlike the finite difference method, thereby enabling the systematic and monotonic reduction of the total energy through refinement [22]. Overall, FEM compares favorably not only with finite difference frameworks [17,10,23] but also with other systematically improvable real-space methods that can achieve local refinement of the discretization, notably those which are based on wavelet basis sets [18,24]. For completeness, a review of FEM applications in ab initio methods is provided next, with a main focus on Kohn-Sham DFT, which will also help clarify the basis of various formulation ingredients employed in this work.

One of the earliest FEM applications [25] was set in a periodic pseudopotential setting and already reported results with higher-order finite elements that were later found crucial towards an effective application of FEM. The canonical one-electron equation, emanating from a Hartree-Fock formulation, was solved in [26] with fifth- and sixth-order triangular elements. This work is significant because of two reasons. First, despite the use of special coordinates for efficiency in view of the non-periodic diatomic problems addressed, an additional Poisson equation was solved on the same discretization for an efficient calculation of the Hartree potential, and self-consistently with the one-electron equation. Second, approximate exchange was employed in Slater form, leading to an approach that may be regarded as a Kohn-Sham DFT scheme without correlation [4] and thereby setting a precedent for subsequent Kohn-Sham DFT studies. The non-periodic Hartree study in [27], omitting exchange in Hartree-Fock theory altogether, was also based on the self-consistent solution of the one-electron and Poisson equations based on higher-order discretizations of (di)atomic problems but without special coordinates, op-tionally with continuous derivatives, and already with a preliminary study of the convergence rates that are important to validate the FEM framework. A similar non-periodic Hartree study was carried out in [28] for diatomic and triatomic molecules composed of hydrogen. However, with the exception of the compact report of [25], it was not until [12] that FEM was applied in the context of Kohn-Sham DFT, with the following features: self-consistent solution of the one-electron and Poisson equations with a view towards periodic systems, all-electron analysis of the hydrogen molecule with a supercell and the nonlocal pseudopotential study for the silicon crystal, and finally the a priori prescribed local refinement of the mesh with second-order serendipity brick elements. Additionally, this study introduces the standard FEM weak form for the one-electron equation by transferring one derivative to the test function, a step that does not seem to have been carried out

İ. Temizer et al. / Journal of Computational Physics 410 (2020) 109364 3

in earlier studies, possibly in view of their Hartree-Fock theory background wherein the formulation towards Roothaan-Hall equations retain the second-order derivative in the Hermitian kinetic energy operator [2]. Although non-periodic problems were not explicitly addressed and convergence rate studies towards chemical accuracy were not presented, this influential work, introduced at about the same time as a pioneering finite difference framework [17], incorporated the major modern ingredients to FEM in the context of Kohn-Sham DFT. This study was followed by [29] wherein large nonlocal pseudopoten-tial studies were presented based on higher-order discretizations with continuous higher-order derivatives and, particularly important for the goals of the present work, the possibility of employing B-splines as alternative basis functions with sim-ilar features was pointed out. These steps were then followed by a series of developments in [30,31,13], both with model potentials as well as nonlocal pseudopotentials, wherein explicit achievement of optimal convergence rates with quadratic and cubic elements towards chemical accuracy was repeatedly demonstrated. A detailed review of FEM applications lead-ing to these developments may be found in [22] and of real-space methods in [18]. See also [32] for an overview of FEM applications to various quantum mechanics problems.

Subsequent works demonstrated local refinement advantages [33–35,14] and optimal convergence rates [33,36–39], in the context of all-electron [40,35,14] and pseudopotential [33,34,14,39] studies on increasingly larger systems, and all ex-cept [14] with higher-order finite elements. The advantages of higher-order elements were decisively established in [41,42]wherein local mesh refinement, combined with efficient solvers and a parallel implementation, allowed all-electron as well as local and nonlocal pseudopotential studies on large non-periodic systems with systematic improvability towards chemical accuracy. Detailed and fairly complete references concerning convergence rates towards chemical accuracy, mesh construc-tion and refinement as well as various choices in problem setups will be provided further within relevant sections of this work, wherein more recent FEM studies will be additionally pointed out. It is noted that, building further on any generic modern FEM framework, the challenge of achieving chemical accuracy can be effectively addressed by locally enriching the finite element space through atom-centered functions. Analytical expressions [37], even classical Gaussians [43], may be employed for this purpose. Recently, however, explicit solutions of the single-atom radial equations have been employed for enrichment, which has been shown to outperform the plane wave basis in periodic pseudopotential [44] and the Gaussian basis in non-periodic all-electron [45] problems. Basis sets which are locally adapted to the solution in order to achieve sig-nificant savings in computation time may also be generated on the fly — see [46,47] for examples and [48] for an overview. Even without enrichment, the capability to outperform state-of-the-art plane wave implementations has been very recently demonstrated [49,50], which further highlights the possibilities that are offered through a FEM framework. It is also noted that developments towards real-space higher-order frameworks that are systematically improvable have also been pursued for orbital-free DFT — see [51] in the context of FEM and [52] for the finite difference method.

Among higher-order finite elements, those with continuous higher-order derivatives are particularly appealing in view of the smoothly varying nature of the electronic structure, not only in the pseudopotential setting but also in the all-electron setting except near the nuclei. B-splines and their rational generalizations NURBS (non-uniform rational B-splines) consti-tute a class of basis functions which satisfy the partition of unity, have local support and can display C p−1-continuity for an order-p discretization [53]. They are additionally non-negative, which distinguishes them from similar basis functions pursued in earlier studies [27,29]. The applications of B-splines in ab initio methods have been reviewed in [54] where FEM is observed to have no significant presence, with the exception of an all-electron radial study of the hydrogen atom in a framework that is equivalent to FEM despite the lack of the standard weak form for the one-electron equation. Indeed, ear-liest FEM-based studies of Kohn-Sham DFT employing B-splines go back to [55] in the all-electron radial case and to [56] for three-dimensional local pseudopotential studies of one-electron systems. Periodic nonlocal pseudopotential studies incorpo-rating B-splines and NURBS in the modern FEM framework have been presented in [39] where optimal convergence rates towards chemical accuracy have additionally been demonstrated. The primary conclusion of this study, recently extended to the analysis of strained crystals [57] where the radial case has also been considered, was that these basis sets can provide higher accuracy per degree of freedom compared with classical higher-order Lagrange elements which have discontinuous derivatives. In two recent studies [58,59], B-splines have been applied to Kohn-Sham DFT in a non-periodic pseudopotential setting, albeit limited to detailed results only for model or single-atom problems, and without the demonstration of optimal convergence rates towards chemical accuracy for the self-consistent solution of the one-electron and Poisson equations. The latter four studies have been triggered by the introduction of B-splines and NURBS into computational mechanics through a series of development which have established a new branch of FEM, namely isogeometric analysis, initiated in [60] and early developments summarized in [61]. For an overview of representative recent contributions to isogeometric analysis, too numerous to attempt in the scope of the present study except for those which will be referenced in relevant sections, the reader is referred to [62].

The precise goal of this work can now be stated as the development of a non-periodic NURBS-based FEM framework for Kohn-Sham DFT with both all-electron and pseudopotential capabilities in radial and three-dimensional cases, its validation through the demonstration of optimal convergence rates towards the ground state energy with chemical accuracy and finally the comparison of its performance with classical higher-order Lagrange discretization at all stages. Towards this goal, the FEM framework of [41] will be taken as a basis, which offers a unified treatment of all-electron and pseudopotential formulations, including efficient solvers for the discrete generalized eigenvalue problem based on the Chebyshev-filtered subspace iteration method [23,63]. Nonlocal pseudopotentials will additionally be incorporated and special care will be taken to ensure that the numerical framework preserves the underlying variational structure of Kohn-Sham DFT in its weak formulation. Specifically, in Section 2, the background for non-periodic electronic structure calculation is provided, starting


from the theoretical foundations and concluding with problem formulations and energy expressions which address radial and three-dimensional cases as well as all-electron and pseudopotential settings. Section 3 then introduces the NURBS basis set, the weak formulation of the problem in a generic FEM basis and the specific mesh structure which will be employed. Detailed numerical investigations are subsequently summarized in Section 4. These start with radial computations in all-electron and local pseudopotential settings on single atoms, and continue with three-dimensional studies where all-electron and nonlocal pseudopotential computations on single atoms and small molecules are followed by local and nonlocal pseudopotential studies on larger systems. In all cases, with the exception of a fullerene study, nearly optimal convergence rates towards chemical accuracy are demonstrated. These numerical examples clearly demonstrate the significantly higher per-degree-of-freedom accuracy displayed by NURBS and thereby indicate efficiency gains which are possible with these basis sets within effective numerical implementations.

2. Non-periodic electronic structure calculation

In this section, the theoretical background governing the electronic structure calculation on a non-periodic material system will be summarized, specifically targeted towards the context of spinless Kohn-Sham DFT. The material system is assumed to consist of N electrons and M nuclei. Each nucleus is assigned a position R A and a charge Z A (A ∈ {1, 2, . . . , M}) while each electron is associated with a coordinate xi (i ∈ {1, 2, . . . , N}) that is composed of a position coordinate ri and a spin coordinate si [64,65]. Additionally, the definition ∇2i = ∂∂ri · ∂∂ri will be employed. The subscripts will be dropped in the case of a single coordinate. All vectors and operators of the underlying Hilbert space will be expressed in the position basis. In this real-space formulation, the Hermitian operators that are associated with the considered non-periodic material system are additionally real so that the corresponding eigenvectors, assumed to be normalized to unity, may be chosen pure real without loss of generality [65]. Therefore, complex conjugation will be dropped from standard manipulations. The notation 〈·〉 will consistently refer to integration over all coordinates of the relevant space and the domain of integration is the entire space. Integration will be explicitly denoted only if necessary for clarity. Throughout this work, atomic units are employed unless otherwise noted.

2.1. Schrödinger equation

Within the Born-Oppenheimer approximation [66,2], the nonrelativistic time-independent Schrödinger equation

Ĥ� = E� (2.1)governs the anti-symmetric electronic wave function � through the electronic Hamiltonian Ĥ which can be decomposed as

Ĥ = T̂ + V̂ ee + V̂ en . (2.2)For an operator Ô , expressed in real space, the expectation value will be indicated by O = 〈� Ô �〉 when necessary. Thus, the electronic energy is E = H = T + V ee + V en . Here, the kinetic energy operator (T̂ ) as well as the electron-electron (V̂ ee) and the electron-nucleus (V̂ en) Coulomb interaction operators have been defined as

T̂ =N∑

i=1

(− 12 ∇2i

), V̂ ee =

N∑i=1

N∑j>i

1∣∣ri − r j∣∣ , V̂ en =N∑

i=1vext(ri) (2.3)

where vext is the external potential due to the nuclei of the material system with a nucleus-nucleus interaction energy Enn :

νA(r) = − Z A|r − R A | , vext(r) =M∑

A=1νA(r) , Enn =

M∑A=1

M∑B>A

Z A Z B|R A − R B | . (2.4)

The nuclear positions will be fixed in this work. The total energy E of the system is E = E + Enn:E[�] = T + V ee + V en + Enn . (2.5)

The solution �(x1, x2, . . . , xN ) representing the ground state, possibly degenerate, minimizes the total energy.

2.2. Kohn-Sham density functional theory

Because � remains elusive due to its high-dimensional structure, a simpler representation of the total energy is favorable [4,5]. Targeting a spinless formulation, the electron density ρ corresponding to a wave function � is

ρ(r) = N∫

|�|2 dx (2.6)where dx indicates integration over all coordinates except for one position coordinate, leading to the representation


V en[ρ] = 〈ρ vext〉 . (2.7)The first and second Hohenberg-Kohn theorems together ensure that T , V ee and, hence, E can likewise be represented in terms of ρ for a non-degenerate ground state �. Moreover, the Levy constrained-search formulation expands this foundation for DFT by ensuring that the ground state density, even when the ground state is degenerate, is the one which minimizes a functional E[ρ] among all admissible ρ .

Towards a practical implementation of DFT in an all-electron setting, the Kohn-Sham formulation recasts E[ρ] asE[ρ] = Ts[ρ] + E H [ρ] + V en[ρ] + Enn + Exc[ρ] . (2.8)

The Hartree energy E H represents the classical electrostatic interaction associated with the electron density, including self-interaction, and may be expressed through the Hartree potential v H = δE Hδρ :

v H [ρ](r) =∫

ρ(r′)|r − r′| dr

′ , E H [ρ] = 12 〈ρ v H 〉 . (2.9)Comparing (2.8) with (2.5), the exchange-correlation energy as the central ingredient is defined as Exc[ρ] = (T [ρ] − Ts[ρ]) +(V ee[ρ] − E H [ρ]). In the context of the local density approximation (LDA) that will be invoked in this work, the explicit form

εxc(ρ) = εx(ρ) + εc(ρ) , Exc[ρ] = 〈ρ εxc〉 (2.10)is assigned where the exchange (εx) and correlation (εc ) parts of εxc , the exchange-correlation energy per electron of a uniform-electron-gas, will be explicitly specified in numerical investigations. Finally, the Kohn-Sham formulation expresses Ts as the exact kinetic energy of a noninteracting reference system, thereby limiting the search space for ρ in the mini-mization of the total energy to those which are associated with the single Slater determinant �s of appropriately chosen orthonormal spin orbitals. Presently assuming a closed-shell atom or molecule in the spinless setting with equal spin-up and spin-down occupancy, the spatial parts of these orbitals will be indicated by the orthonormal functions ψi(r) and the sum is understood to run from 1 to N/2, leading to the expressions

ρ(r) = 2∑i |ψi(r)|2 (2.11)and, defining ψ to indicate the set of all spatial orbitals ψi ,

Ts[ψ] = 〈�s T̂ �s〉 = 2∑i〈ψi (− 12 ∇2)ψi〉 . (2.12)Kohn-Sham formulation of the total energy (2.8) can now be expressed as

E[ψ] = 2∑i〈ψi (− 12 ∇2)ψi〉 + EC + 〈ρ εxc〉 (2.13)where the total electrostatic (Coulomb) energy EC has been introduced:

EC = E H + V en + Enn = 12 〈ρ v H 〉 + 〈ρ vext〉 + Enn . (2.14)It is important to highlight at this stage that the functional E will not strictly describe a minimization problem unless v His exactly evaluated via (2.9)1. Consequently, an update to an estimate for ψ that delivers an improved (lower) energy with an exact evaluation of v H may fail to do so in the case of an inexact evaluation, and may even deliver an energy which is lower than the ground state energy [67]. Both of these instances will be interpreted as non-variational results. With this note, building on the approach of [67], a reformulation of EC is now carried out which, however, has been shown to alter the structure of the problem from one of minimization to that of a saddle-point type in the context of orbital-free [68] and Kohn-Sham [14] DFT. The ramifications of this reformulation will be addressed in the discussion of numerical discretization (Section 3).

Through a derivation which closely follows Hartree-Fock theory [2], the minimization of the total energy (2.13) over ψidelivers the canonical Kohn-Sham equation as a nonlinear eigenvalue problem for the spatial orbitals ψi together with the orbital energies εi which, orthogonality being implicit, act as Lagrange multipliers that enforce the normalization of ψi to unity:

ĥψi = εiψi . (2.15)Here, the one-electron Hamiltonian ̂h is defined through the effective local potential veff

veff = vC + vxc , ĥ = − 12 ∇2 + veff (2.16)with vC [ρ](r) = v H [ρ](r) + vext(r) as the total electrostatic (Coulomb) potential and vxc(ρ) = δExcδρ as the exchange-correlation potential. The set of N/2 spatial orbitals ψi with the lowest orbital energies εi are then chosen to evaluate


the density (2.11) and subsequently the total energy (2.13). The solution of (2.16) already requires an iterative scheme in view of its nonlinear nature. For this purpose, a self-consistent field (SCF) solution approach will be applied (Section 2.5). In order to improve the numerical efficiency of this approach, a reformulation of the electrostatic energy is considered first.

2.3. Electrostatic energy

Representing the nuclear charge distribution via βA(r) = −δ(r − R A)Z A and b(r) = ∑MA=1 βA(r), the nuclear potentials (2.4)1,2 satisfy the expressions

− 14π ∇2νA = βA , − 14π ∇2 vext = b , (2.17)based on which one may define a self-interaction energy Eself and a term Eext including this self-interaction similar to E H :

Eself =M∑

A=112 〈βA νA〉 , Eext = 12 〈b vext〉 , Enn = Eext − Eself . (2.18)

Although Enn is well-defined, both Eself and Eext are divergent for point charges accompanied by the exact nuclear (Coulomb) potentials. Nevertheless, these terms will be well-defined in the numerical setting due to the mesh-induced regularization of the nuclear potentials and hence will be retained in an all-electron setting [41]. A direct evaluation of Ennbased on a neutralizing background charge distribution, which is regularized but local around each ion, is also possible as an alternative approach [69].

Because it is numerically inefficient to compute v H via (2.9)1, a reformulation similar to (2.17)2 is convenient. The Hartree potential, in view of its definition, and the total electrostatic potential, in view of the linearity of the Poisson problem, satisfy

− 14π ∇2 v H = ρ , − 14π ∇2 vC = ρ + b . (2.19)Making use of the equality 〈ρ vext〉 = 〈b v H 〉 induced by (2.19)1 and (2.17)2, an alternative expression for the total electro-static energy is obtained:

EC = 12 〈(ρ + b) vC 〉 − Eself . (2.20)Both terms on the right-hand side will remain well-defined in the presence of point charges due to the mesh-induced regularization of the nuclear potentials. It is noted that v H and vext, hence E H and V en, are individually divergent for a periodic system so that only their sums remain well-defined [22,69], which highlights the advantage of employing (2.20)rather than (2.14) within (2.13) in general. In this work, over sufficiently large domains, (2.19)2 will be solved for vC with homogeneous Dirichlet boundary conditions towards the evaluation of EC and (2.17)1 for νA with the prescribed exact potential value as a Dirichlet boundary condition towards the evaluation of Eself . Invoking (2.15) to reformulate the kinetic energy (2.12) and employing (2.20), (2.13) now leads to the final energy expression [13,41]

E[ψ] = (2∑iεi − 〈ρ veff〉) + ( 12 〈(ρ + b) vC 〉 − Eself) + 〈ρ εxc〉 . (2.21)It is remarked that if a formulation based on v H is chosen for a non-periodic problem then one may either directly solve

(2.19)1 where the boundary conditions are determined via multipole expansion [27,28,35,38,70] or reformulate this equation through a neutralizing charge density in order to impose homogeneous Dirichlet boundary conditions [33,40], essentially in the same spirit as the formulation of (2.19)2 [71].

2.4. Pseudopotential formulation

In the local pseudopotential formulation [72,7], Z A represent the ionic charges (i.e. the nuclear charges reduced by the number of core electrons) and ρ is the valence electron density. The ionic pseudopotentials νA , available in explicit forms that are spherically symmetric, are no longer Coulombic in nature but exactly match or rapidly approach this behav-ior beyond a localized ball around each ion. (2.17)1 then defines βA which no longer represent point charges but rather regularized distributions that vanish or rapidly decay to zero beyond the ball radius [13]. The total electrostatic potential corresponding to the sum of the valence electron density and the regularized ionic charge distribution may be solved from (2.19)2 and the energy expression (2.21) therefore remains applicable without modification, where Eself is now well-defined from the outset.

In the nonlocal pseudopotential formulation, the preceding discussion may be largely preserved by interpreting the electron-ion interaction energy V en appearing within (2.13) and the corresponding ionic potentials νA contributing to vextas arising only from the purely local parts of the ionic pseudopotentials. The total interaction energy of the valence electrons with the ions is 〈ρ vext〉 + E N L where the nonlocal energy contribution E N L is likewise added to (2.13) so that minimization


yields a one-electron Hamiltonian for the Kohn-Sham equation (2.15) which is now rendered nonlocal as well through augmentation by the remaining nonlocal part v N L of the total pseudopotential [72,7]:

E N L[ψ] = ∑i〈ψi(r) v N L(r, r′)ψi(r′)〉 , ĥ = − 12 ∇2 + veff + v N L . (2.22)Here, the definition of veff in (2.16)1 as a local effective potential has been preserved. As a consequence, the energy ex-pression (2.21) again remains unmodified in view of the canceling nonlocal energy contributions from the kinetic energy and the electron-ion interaction energy. From a practical perspective, it is unnecessary to alter the Poisson problems in the pseudopotential formulation through the incorporation of regularized charge distributions — the all-electron expression of the energy based on point charges may be retained entirely (see Appendix A), thus preserving a unified framework for both problem settings.

Similar to the local part, the nonlocal part v N L of the total pseudopotential is expressed as a sum of ionic contributions as well:

v N L(r, r′) =

M∑A=1

�A(r, r′) . (2.23)

Each ionic contribution �A will be based on a sum of separable terms and can be expressed succinctly as [73,74]

�A(r, r′) =

P∑α=1

P∑β=1

λα(r − R A)hαβ λβ(r′ − R A) (2.24)

where the sum bound P , the real parameters hαβ = hβα and the functions λα , which vanish or rapidly decay to zero beyond a localized ball around each ion, are ion-dependent. In view of the particular formulation to be employed [74], λαare available explicitly and can be expressed through the product of spherical harmonics with functions in Gaussian form.

2.5. Self-consistent field iteration

Within the iterative SCF solution approach to be applied, let ρ in(r) be an estimate for the ground state density at an SCF iteration, satisfying the requirements on ρ but not necessarily related to a set of orthonormal spatial orbitals, and determine v inC = vC [ρ in](r) as well as v inxc = vxc(ρ in). Employing v ineff = v inC + v inxc in the solution of (2.15) for ψout then delivers an updated density ρout(r), defined via (2.11), for which voutC = vC [ρout](r) and εoutxc = εxc(ρout) are additionally determined. The total energy that is consistent with this update is therefore

E[ψout] = (2∑iεi − 〈ρout v ineff〉) + ( 12 〈(ρout + b) voutC 〉 − Eself) + 〈ρout εoutxc 〉 . (2.25)In this work, the Anderson mixing scheme will be employed in order to update ρ in(r) at each SCF iteration [41], making use of the full iteration history of {ρ in, ρout} together with a mixing parameter value of 0.5 by default. The estimate ρ in(r)at the first iteration is generated as a superposition of atomic densities which are based on radial all-electron computations (Section 2.6). In the pseudopotential formulation, the atomic densities are approximated by the densities of the valence electrons from the all-electron formulation.

SCF iterations are continued until the total energy and the density both converge to within prescribed tolerances. In order to avoid possible charge sloshing when integer occupancy shifts among (nearly) degenerate energy levels, leading to small changes in the total energy but larger changes in the density which may prohibit convergence, the formulation is extended towards fractional occupancy by smearing integer occupancy among such levels [11]. Recalling that a closed-shell electronic structure was assumed thus far, this extension will additionally allow addressing an open-shell atom or molecule in the context of LDA. Specifically, the electron density is reformulated as

ρ(r) = 2∑i f i |ψi(r)|2 (2.26)where f i ∈ [0, 1] is an orbital occupancy factor, subject to the requirement 2 ∑i f i = N via the normalization condition on the density (〈ρ〉 = N), and the sum covers the set of all occupied eigenstates. After a similar reformulation of the kinetic energy (2.12), the total energy (2.21) takes the form

E[ψ] = (2∑i f iεi − 〈ρ veff〉) + ( 12 〈(ρ + b) vC 〉 − Eself) + 〈ρ εxc〉 . (2.27)In this work, the finite-temperature Fermi-Dirac distribution is employed to evaluate f i at each SCF iteration [41], making use of a smearing temperature of 100 K by default.


2.6. Radial case

For a single atom, the Poisson and the Kohn-Sham equations may be expressed in radial form [64,7], provided that the electron density is spherically symmetric. This, in turn, is ensured by equal fractional occupancy of degenerate orbitals, requiring the use of (2.27) as the extension of (2.21) for the total energy evaluation [75]. Within (2.27), the sum index will enumerate all orbitals associated with different principal (n = 1, 2, . . .), azimuthal (l = 0, 1, . . . , n − 1) and magnetic (m = −l, . . . , l) quantum numbers. Specifically, defining r = |r| and r̂ = r/r, each orbital may be explicitly indicated as ψnlm(r) = Rnl(r)Ylm(r̂) where Rnl(r) are radial functions, which are normalized as

∫ |Rnl|2 r2dr = 1, and Ylm(r̂) are the spherical harmonics, which satisfy

∑m |Ylm|2 = (2l + 1)/(4π). The sum in (2.26) therefore leads to a spherically symmetric

electron density and may be explicitly stated as

ρ = 2∑n ∑l ∑m fnlm |ψnlm(r)|2 = 2∑n ∑l 2l+14π Fnl |Rnl(r)|2 (2.28)where fnlm ∈ [0, 1], subject to the normalization condition 2 ∑n ∑l ∑m fnlm = N , are equal for all m at given {n, l} and are indicated by Fnl . This form induces the radial counterparts of the Poisson equations (2.17)2, equivalent to (2.17)1 for a single atom, and (2.19)2 for the radial functions vext(r) and vC (r), respectively, where ∇2(·) = r−2(r2(·)′)′ with (·)′ = d(·)dr . The accompanying radial form for the Kohn-Sham equation (2.15) is expressed through

vl(r) = l(l + 1)2r2 , ĥl = −12∇2 + veff + vl , ĥl Rnl = εnl Rnl , (2.29)

which needs to be solved for each l due to the appearance of the centrifugal term vl , and self-consistently with the Poisson equations. Overall, the framework for handling the radial case remains identical to the three-dimensional one towards the evaluation of the total energy expression (2.27), thus preserving a unified framework for both cases. In view of Section 2.4, the same expression will be employed with a local pseudopotential in the radial case as well and, following a similar argument, the nuclear potential can alternatively be employed in exact form within (2.29)2 if desired in the all-electron setting. Note that there will be only a single energy level n in the pseudopotential solutions of this work, which will be indicated with the corresponding valence value from the all-electron setting. See Appendix B for a discussion of the boundary conditions.

3. Finite element method framework

In this work, the numerical solutions of the Poisson and Kohn-Sham problems will be carried out through FEM, which is outlined next for the three-dimensional case. Here, higher-order Lagrange and NURBS basis functions will be employed in the discretization of the solution variables which, within an isoparametric formulation, will match the geometry dis-cretization. In what follows, a reference to a negligible error magnitude associated with a numerical choice indicates the insignificant influence of this choice in the context of convergence rate analysis for total energy based on mesh refinement.

3.1. NURBS discretization

NURBS construction relies on a knot vector, which can be either closed or open depending on whether a periodic or a non-periodic distribution of the basis functions is targeted [53]. The focus of this work is on the latter so that only open knot vectors will be employed. For this purpose, consider a three-dimensional domain that is spanned by the parametric directions α ∈ {1, 2, 3}. Defining mα = nα + pα + 1, a knot vector �α is associated with each direction, containing mα + 1non-decreasing knots ξα that are non-uniformly distributed in general, such that the first and last pα + 1 of them have the same value:

�α = {ξα0 , . . . , ξαpα︸︷︷︸repeated

, ξαpα+1, . . . , ξαnα , ξ

αnα+1, . . . , ξ

αmα︸︷︷︸

repeated

} .(3.1)

Such a knot vector defines, as a function of the parametric coordinate ξα ∈ [ξα0 , ξαm ], nα + 1 univariate B-spline basis func-tions Bαa (ξα), a ∈ {0, . . . , nα}, which are non-negative, satisfy the partition of unity, have local support and are piecewise polynomial of order pα . Based on this structure, the positive normalizing function

W (ξ1, ξ2, ξ3) =n1∑

a=0

n2∑b=0

n3∑c=0

wabc B1a(ξ

1) B2b(ξ2) B3c (ξ

3) (3.2)

is introduced, where the weights wabc > 0 help define trivariate NURBS basis functions

Rabc(ξ1, ξ2, ξ3) = wabc

1 2 3B1a(ξ

1) B2b(ξ2) B3c (ξ

3) , (3.3)

W (ξ , ξ , ξ )


Fig. 1. (a) The basis functions of a N 1 (equivalently, L1) discretization with unit weights are displayed, which will serve as the base patch. The horizontal axis corresponds to ξ and the boundaries between the elements are indicated by tics. (b) The weights of the base patch are modified to obtain a rational N 1 discretization. (c) Alternatively, the knot vector of the base patch is modified to obtain a cubic B-spline discretization. (d) A cusp is inserted if an interior knot of the cubic discretization appears three times.

which are non-negative, satisfy the partition of unity, have local support and are piecewise rational. Although this is not a tensor-product basis, such a structure does emerge in a four-dimensional space based on homogeneous coordinates. As a special case of NURBS when the weights are all unity, trivariate B-spline basis functions are obtained in piecewise polynomial tensor-product form, similar to Lagrange basis functions. Note that the tensor product structure highly restricts mesh generation with NURBS. Local refinement is nevertheless possible, introduced in [76] and further developed in [77–79]. Extensions beyond the original tensor-product construction is also possible, for instance in the context of T-splines which were introduced in [80] and further developed in [81–83]. Such possibilities will not be pursued presently.

The knot vectors together with the weights and the degrees of freedom multiplying the basis functions, which are referred to as control points, constitute a patch. Because Bα0 (ξ

α0 ) = 1 whereas Bαa>0(ξα0 ) = 0, and Bαn (ξαmα ) = 1 whereas

Bαaα


increased from two to four. Finally, if the interior knot 12 of this N 3 discretization appears three times, a cusp is inserted at the corresponding point.

3.2. Weak formulation

A multi-mesh setup, to be discussed further in Section 3.3, is introduced where two finite element meshes indicated by M(1) and M(2) are generated for the Kohn-Sham equation (2.15) and the Poisson equation (2.19)2, respectively, accompa-nied by shape functions N(m)I and test functions ϕ

(m) . The weak forms of these equations may be expressed as

12 〈∇ϕ(1) · ∇ψi〉 + 〈ϕ(1) veff ψi〉 = εi〈ϕ(1) ψi〉 , 14π 〈∇ϕ(2) · ∇vC 〉 = 〈ϕ(2) (ρ + b)〉 , (3.4)

subject to homogeneous Dirichlet boundary conditions. Note that the structure of (2.17)1 is similar to (2.19)2, except for the Dirichlet boundary conditions (Appendix C), and therefore its weak form is not explicitly noted. Moreover, (2.19)2 and (2.15)have to be solved repeatedly, whereas the set of equations (2.17)1 needs to be solved only once towards the evaluation of Eself in (2.21), also employing M(2) in order to ensure that the mesh-induced regularization (Section 2.3) delivers a convergent value for Enn with mesh refinement.

Introducing the discrete Hamiltonian [H] and the basis overlap matrix [M] with components

H I J = 12 〈∇N(1)I · ∇N(1)J 〉 + 〈N(1)I veff N(1)J 〉 , MI J = 〈N(1)I N(1)J 〉 (3.5)as well as the (scaled) discrete Laplacian [L] and a vector {c} emanating from the charge distributions with components

LI J = 14π 〈∇N(2)I · ∇N(2)J 〉 , cI = 〈N(2)I (ρ + b)〉 , (3.6)the following two systems of equations need to be solved self-consistently, subject to homogeneous Dirichlet boundary conditions:

[H]{ψi} = εi[M]{ψi} , [L]{vC } = {c} . (3.7)All matrices in (3.7) are sparse, real, symmetric and the overlap matrix is additionally positive-definite. Consequently, the discrete generalized eigenvalue problem (3.7)1 inherits the properties of (2.15): the eigenvalues are real and the eigenvec-tors belonging to distinct eigenvalues are M-orthogonal. In the radial case, the problem size is sufficiently small so that the LAPACK routine dsygvd will be employed [84]. In the three-dimensional case, however, the eigenvectors are typi-cally discretized with a large number of degrees of freedom in view of the accuracy requirements on the total energy. The Chebyshev-filtered subspace iteration method is a recently proposed efficient approach for such large eigenvalue problems when integrated into SCF iterations [23,63], originally proposed for the standard format. Following its adaptation to the generalized format [85], this method is employed in order to solve (3.7)1. To this end, an efficient computation of [M]–1 is needed and approaches which deliver good scaling in parallel implementations exist [85]. Presently, because parallelization is not targeted, the Cholesky factorization of [M] is carried out once at each SCF iteration (not stored through iterations due to memory limitation in the available computational resource) and repeatedly employed in the operation of [M]–1[H]on a vector through forward-backward substitution. For the upper bound to the eigenvalue spectrum that is needed in the construction of the Chebyshev filter, the estimate in [63] is adapted by employing the Lanczos algorithm in the generalized format [86]. The subspace size increases with the number of electrons, leading to higher computational cost both with respect to time and memory for larger material systems. Nevertheless, it remains more efficient in comparison to alterna-tive methods [23,41]. To generate an initial subspace at the first SCF iteration, the atomic orbitals from radial all-electron computations (Section 2.6) are employed, and only the valence ones in the pseudopotential setting. Note that the diag-onalization of the overlap matrix would deliver a standard eigenvalue problem which is computationally less expensive. However, a straightforward approach such as lumping not only leads to an unacceptable loss of accuracy but also limits the convergence rate to linear and is not readily applicable to higher-order Lagrange discretizations due to nonpositive lumped entries — see [38] for successful application to non-periodic nonlocal pseudopotential studies based on quadratic hexahe-dral elements. The reformulation of Lagrange basis functions towards the spectral setting, in combination with Gauss-Lobatto quadrature, does deliver a diagonal overlap matrix [9] along with optimal convergence rates [41]. Because a similar approach does not exist in the context of NURBS, comparison of NURBS and Lagrange discretizations will presently be carried out in the generalized eigenvalue problem setting.

In the nonlocal pseudopotential formulation (Section 2.4), the weak form must be augmented by a contribution from v N L , which may be expressed as

〈ϕ(1)(r) v N L(r, r′)ψi(r′)〉 =M∑

A=1〈ϕ(1)(r)�A(r, r′)ψi(r′)〉 (3.8)

where


〈ϕ(1)(r)�A(r, r′)ψi(r′)〉 =P∑

α=1

P∑β=1

〈ϕ(1) λα〉hαβ 〈λβ ψi〉 . (3.9)

Therefore, making use of the separable nature of the ionic pseudopotentials, the overall nonlocal contribution to the discrete Hamiltonian is a symmetric matrix [P ] with components P I J :

P I J =M∑

A=1

(P∑

α=1

P∑β=1

pIα hαβ pJβ

), pIα = 〈N(1)I (r)λα(r − R A)〉 . (3.10)

This nonlocal contribution induces extended coupling among the degrees of freedom in the neighborhood of each ion, governed by the decay of λα(r − R A) towards zero, because N(1)I and N(1)J appear in separate integrals. In this work, in order to limit the loss in the sparsity of the discrete Hamiltonian resulting from this nonlocality, the magnitude of pIα is monitored and it is treated as zero below a sufficiently small value, chosen to ensure that the resulting error is negligible.

All integrals emanating from an order-p discretization of the weak form, NURBS or Lagrange, will be evaluated with p +1quadrature points (per parametric direction of each element) by default, which would allow for the exact evaluation of the overlap matrix with piecewise polynomial basis functions on elements with constant Jacobians. This choice was found to deliver negligible error in an all-electron setting. In the pseudopotential setting, however, the addition of quadrature points was found to improve total energy values, in particular on coarse meshes. For this purpose, an additional two points were employed only on Lagrange meshes in the evaluation of all integrals, except for the nonlocal contributions to the discrete Hamiltonian. For the latter, an additional six/four points on Lagrange/NURBS meshes were employed. It is highlighted that order elevation on a NURBS mesh does not reduce the number of elements, unlike Lagrange meshes, and therefore leads to an increasing integration cost for a given mesh resolution with increasing order. Efficient quadrature rules which can alleviate this cost have recently been introduced [87] and further developed [88,89], but are presently not implemented.

3.3. Mesh structure

The mesh structure in Fig. 2(a) is adopted for both NURBS and Lagrange discretizations, which will provide a common basis for comparison among results that are obtained with different discretization choices (see [9,16] for similar non-periodic analysis domain constructions). The initial domain geometry is generated from seven quadratic NURBS patches, each with a single element, which together provide an initial mesh. The core patch (i.e. the cubic domain at the center of the mesh structure) has unit weights and an edge length 2d1. The surrounding patches are assigned position control points and weights that are chosen in order to represent a nearly spherical outer surface. A sphere cannot be generated with this patch structure but the deviation from such an ideal geometry is at most one percent for the present mesh structure so that the domain size can be indicated with a radius d2. This radius is chosen sufficiently large for a proper imposition of the boundary conditions on the Poisson and Kohn-Sham problems. This quadratic NURBS mesh is subsequently modified towards the desired discretization for analysis purposes:

1. NURBS discretizations: The initial mesh is first degree-elevated to reach the desired order p and subsequently uniformly refined towards the target resolution which will be indicated by the number of elements eo per parametric direction of the core patch. The number of elements along the angular parametric directions of the surrounding patches are also eo but along the radial parametric direction it is chosen as eo/2. Mesh grading is additionally applied along the radial direction for efficiency as depicted in Fig. 2(b). Typically, the core size d1 is chosen smaller for the all-electron setting compared to the pseudopotential setting, and the rate of element coarsening slightly higher.

2. Lagrange discretizations: The described NURBS refinement and grading procedure is first followed without degree-elevation. The resulting mesh is either employed as a linear Lagrange discretization, by discarding the position control points and assigning the physical positions associated with the unique knot entries as nodal positions, or additionally order-elevated by joining neighboring linear elements into larger higher-order Lagrange elements.

In the radial case, a similar approach is retained where a core region of size d1 is meshed uniformly with eo elements, fol-lowed by a second region extending to a domain size of d2 that has a graded mesh, also with eo elements. Unit weights will be employed in the radial mesh construction, effectively leading to B-spline discretizations. When B-spline discretizations were employed in the three-dimensional case as well, by choosing unit weights for the surrounding patches in addition to the core one, no significant impact on the solution was observed. This indicates that B-spline discretizations are just as effective as NURBS in the context of this study, which is anticipated because the weights are prescribed beforehand and do not act as additional degrees of freedom. Nevertheless, the outlined mesh construction scheme will be retained and, therefore, the discretization it delivers will be referred to as a NURBS description.

In view of this mesh generation procedure, the number of degrees of freedom in a Lagrange mesh is independent of the order for a given eo value that determines the underlying quadratic NURBS mesh. Hence, eo will be employed as an indicator for mesh resolution for both NURBS and Lagrange discretizations at all orders and will help compare the accuracy provided by NURBS and Lagrange meshes at a given resolution and order. In particular, for the purposes of convergence


Fig. 2. The mesh structure discussed in Section 3.3 is summarized.

Table 1For the mesh generation procedure described in Section 3.3, the total number of control points which are not subject to Dirichlet boundary or patch continuity conditions (no ) at a given mesh resolution (eo ) is compared for different three-dimensional discretizations employed in numerical investigations. Additionally, n̄1/3o indicates n1/3onormalized by its value for L1 and is provided as an alternative measure of mesh resolution. The rise in n̄1/3odecreases with eo and is even smaller in the radial case. In these N p meshes, there are no repeated interior knots, which would lead to slightly larger no values.

eo no (n̄1/3o )

L1 N 2 N 3 N 4 N 5 N 6

6 779 (1) 1,400 (1.22) 2,273 (1.43) 3,440 (1.64) 4,943 (1.85) 6,824 (2.06)12 6,527 (1) 8,840 (1.11) 11,621 (1.21) 14,912 (1.32) 18,755 (1.42) 23,192 (1.53)18 22,427 (1) 27,512 (1.07) 33,281 (1.14) 39,776 (1.21) 47,039 (1.28) 55,112 (1.35)24 53,663 (1) 62,600 (1.05) 72,437 (1.10) 83,216 (1.16) 94,979 (1.21) –36 182,879 (1) 202,760 (1.04) 223,973 (1.07) 246,560 (1.10) – –48 435,647 (1) 470,792 (1.03) 507,701 (1.05) – – –

rate analysis with both Lagrange [90,91] and NURBS discretizations [61,92], the asymptotic finite element error in total energy is expected to be of the form C(1/eo)2p where the constant C depends on the order and type of discretization. Unlike Lagrange meshes, the number of degrees of freedom in a NURBS mesh actually increases with order for a given eo(see Fig. 1 for an example). Specifically, if the total number of control points which are not subject to Dirichlet boundary or patch continuity conditions is indicated by no then (no)1/3 could be employed as an alternative indicator for mesh resolution. However, the difference in this value between corresponding N p and Lp discretizations becomes increasingly smaller with increasing resolution (Table 1). Hence, conclusions drawn from accuracy comparisons based on eo alone remain valid. Moreover, this rise in no is due to basis functions which provide higher resolution near the patch boundaries alone, without significantly contributing to the solution quality away from these boundaries. These further highlight the suitability of eo as an indicator for mesh resolution and for comparing the per-degree-of-freedom accuracy of NURBS and Lagrange meshes at sufficiently high resolutions. It is noted that, in view of this discussion, employing (no)1/3 instead of eo will lead to higher pre-asymptotic convergence rates for NURBS.

NURBS discretizations lead to less sparse matrices than Lagrange discretizations. In the radial case as a specific example, sufficiently away from the patch boundaries, each N p basis function overlaps with 2p other basis functions. However, the number of overlaps for (nodal) Lp basis functions is 2p if they are associated with nodes which lie at the element boundaries but only p for those that are associated with the inner nodes. Additionally recalling their higher integration cost (Section 3.2), NURBS discretizations lead to a larger computational effort in the solution of the Poisson and Kohn-Sham problems at a given resolution. Consequently, sufficiently higher per-degree-of-freedom accuracy must be displayed by NURBS discretizations if they are to demonstrate a potential for being competitive with Lagrange discretizations, which will indeed be observed in the numerical investigations. It is remarked that NURBS discretizations will have a practical advantage in meshing due to the possibility of increasing the resolution in a more controllable fashion. Specifically, any N pdiscretization may be realized for a given eo whereas an Lp discretization may be realized only if eo/p is an integer.


Recalling the discussion in Section 2.2, an inaccurate representation of v H in comparison to its exact value (2.9)1 for a given ρ(r) may lead to non-variational results that may not only reflect as large errors in the total energy but also hamper the observation of optimal convergence rates. A numerical setup where the same mesh is employed for the solution of the Poisson and Kohn-Sham problems was observed to be particularly prone to such an issue, even in the radial case. Because v H implicitly appears within the solution vC to (3.7)2, one way to mitigate this issue is by solving the Poisson problem on a finer mesh M(2) than the mesh M(1) for the Kohn-Sham problem. As M(2) is refined for a fixed M(1) , the error in the representation of v H will diminish and the error in total energy will be dominated by the resolution of M(1) , which would then ensure variational results. In order to limit the computational expense associated with this multi-mesh (MM) setup, twice the resolution (2eo) will be employed for M(2) for a given resolution (eo) of M(1) but with the same order, which will be indicated with MM2. This choice guarantees neither variational results nor optimal convergence rates, in general, but was found to perform satisfactorily in the numerical investigations and was previously suggested in [12]. Alternatively, employing the same resolution but with twice the order may be chosen, which was suggested in [71] and employed in [70]. Instances where a single mesh is sufficient will be indicated with MM1. Also recalling Table 1, it is noted that the finest mesh employed in this work is a Lagrange discretization with eo = 60, corresponding to no = 853, 439 on M(1) and, in the case of MM2, no = 6, 869, 279 on M(2) .

Finally, it is noted that the all-electron solutions of the Poisson and Kohn-Sham problems should physically display a cusp at the nucleus [2,7]. Lagrange discretizations easily accommodate such a behavior if the nuclei positions coincide with the nodes. For NURBS discretizations, on the other hand, C p -continuity at these positions must be reduced to C0-continuity. The nuclei will always be placed within the core patch of the mesh structure. Following the discussion in Section 3.1, continuity reduction is realized by introducing repeated knots in the core patch along each parametric direction at all parametric coordinates corresponding to nuclei positions. This reduction is also replicated along the matching angular parametric directions of the surrounding patches in order to ensure an identical distribution of control points on both sides of a patch interface for enforcing C0-continuity. Pseudopotential solutions do not display a cusp and therefore such a mesh modification is not needed in that setting. Examples to all-electron and pseudopotential solutions, which demonstrate these qualitative features, will be provided in Section 4.1.2.

4. Numerical investigations

A series of numerical investigations will be presented in this section, with the purpose of demonstrating the numer-ical performance of the NURBS-based FEM framework for Kohn-Sham DFT. In all examples, comparisons with Lagrange discretization results on identically constructed meshes, in the sense of Section 3.3, will additionally be presented in or-der to highlight the potential advantages of a NURBS-based approach. These comparisons will be primarily based on the per-degree-of-freedom accuracy delivered at a given order and resolution. Although the run time is practically a leading efficiency concern, detailed comparisons of this aspect will be largely omitted in the discussions, apart from representative remarks, because no attempt has been made to optimize the developed code or parallelize it beyond basic OpenMP mul-tithreading. However, it is noted that parallelization capabilities are available in the context of NURBS-based FEM as well [93]. Discretizations up to sixth order will be considered, in particular because NURBS became prohibitively expensive in three-dimensional examples beyond this order with respect to both run time and memory usage. The examples and the discretizations were limited by the available computational resource (a workstation with 196 GB memory and 24 cores).

Recalling the discussion in Section 3.3, in all examples presented in this work, the total energy E remains variational with respect to a reference value E0, to be computed on a sufficiently converged resolution and indicated explicitly, so that the error in total energy is always positive and ideally follows, for both N p and Lp discretizations, the asymptotic form

E − E0 = C(1/eo)2k (4.1)where k = p is the optimal convergence rate. In the numerical setting, k will always be estimated from the last three error calculations from a series of mesh refinements and denoted in the error plots next to the indicator for the discretization choice in parentheses. In these plots, where the results from many different discretization choices will be included, data points are shown (or, omitted) so as to provide a clear comparison and, to avoid cluttering, all points belonging to the same discretization choice are connected with lines although k is estimated from a linear least squares fit. For explicit demonstrations of convergence rates in FEM-based ab initio studies, see [27,30,31,13,33,36–39,51,41,70].

The investigations will start with radial single atom computations in all-electron and local pseudopotential settings and continue with three-dimensional computations, which range from all-electron and pseudopotential studies on single atoms as well as small molecules to local and non-local pseudopotential studies on larger systems. See [55,57] for earlier FEM-based radial studies with B-splines and [54] for a related one-electron study. The exchange-correlation choices will be noted explicitly, and implemented using Libxc [94]. The accuracy targeted in computations, shown as horizontal lines on the plots, is an absolute error in total energy of 1 kcal/mol (≈ 0.0016 Ha) per atom in the all-electron (AE) setting, which corresponds to standard chemical accuracy, and 0.0002 Ha per atom in the pseudopotential (PP) setting. It is remarked that the same mesh structure was employed for all NURBS and Lagrange discretizations in order to provide a common basis for comparison. However, no attempt has been made to adjust the mesh in order to minimize the error in total energy at a given resolution, thereby possibly displaying a need for relatively high resolutions to achieve target accuracies in some


Fig. 3. Radial computations on the hydrogen atom are summarized based on (a) the solution of the Schrödinger equation and (b) the AE Kohn-Sham DFT formulation. The predicted convergence rate k within (4.1) is denoted in parentheses.

examples. For explicit demonstrations of comparable accuracies in representative FEM-based Kohn-Sham DFT studies, see [13] (periodic with nonlocal PP), [33] (periodic with local PP), [35] (non-periodic with AE for a hydrogenic atom and a diatomic molecule), [35] (periodic with nonlocal PP), [41] (non-periodic with AE for small/large molecules and local PP for large clusters) and [71] (non-periodic with AE for atoms and small molecules). Problems of comparable sizes are considered in this work. Target accuracies on larger non-periodic systems have only recently been achieved with FEM [42] (non-periodic AE and nonlocal PP for large clusters) — see also [45,49] for further developments.

4.1. Radial case

4.1.1. Hydrogen atomAs the first example in the radial case, the hydrogen atom is considered, with a core size d1 = 1. The domain size is

d2 = 25 in all computations, unless otherwise noted, and was verified to be sufficiently large in all relevant cases. Because the Schrödinger equation (2.1) for hydrogen is analytically tractable [64], delivering the ground state energy E0 = −0.5as an exact reference value, a special setup is considered where (2.29)2 is solved directly on a single mesh (MM1) with veff = −1/r, without the need for Poisson problems or SCF iterations, thereby capturing the radial Schrödinger equation. Fig. 3(a) demonstrates that optimal convergence rates are achieved. Moreover, a significantly higher accuracy is delivered by NURBS at a given resolution and order, an observation which will hold in all radial and three-dimensional examples. In this case, despite the simple mesh structure employed, N 6 already delivers E = −0.499 999 993 60 at eo = 12, corresponding to only 34 degrees of freedom, and L6 delivers E = −0.499 999 999 86 at eo = 48 with 98 degrees of freedom.

Next, the hydrogen atom is addressed in an AE setting based on the Kohn-Sham DFT approach where the Poisson and Kohn-Sham problems are solved self-consistently, presently in a multi-mesh setup (MM2). The reference total energy is computed on a fine mesh in all radial cases with N 6 at eo = 48 (d1 is chosen as 0.5 for AE and 2.5 for PP computations). Presently, its value is E0 = −0.445 670 518, which differs from −0.5 due to self-interaction in the DFT formulation employed. All radial AE reference values are converged to 10−9 accuracy or better with respect to mesh refinement, and match NIST atomic reference data for electronic structure [95] to 10−6 accuracy and converged results obtained with dftatom [75] to higher accuracy. To ensure this match, radial AE computations are carried out based on classical Slater exchange combined with Vosko-Wilk-Nusair correlation [96]. Optimal convergence rates in Fig. 3(b) further validate the general computational framework in the radial case and strengthen the observation that NURBS deliver higher accuracy. In particular, N 3 stands out as a specific discretizaton choice which already delivers a faster route to chemical accuracy than all other Lagrange discretizations. It is remarked that non-variational results are observed in this case for most discretization choices if a single-mesh (MM1) is employed, which highlights the advantage of the multi-mesh setup. In the radial case, the piecewise polynomial nature of the density emanating from the underlying B-spline discretization actually allows for the analytical integration of the corresponding Poisson equation towards the Hartree potential, which can be used within the Kohn-Sham equation together with the exact nuclear potential, thereby circumventing the need for the MM2 setup. Presently, a common computational framework which applies simultaneously to the radial and three-dimensional cases was preferred, in order to demonstrate some of the important numerical aspects already in the former case.

The impact of the mesh on the computational accuracy as well as on the convergence rate will be emphasized at this point, targeting a set of well-known observations in the context of this work. Specifically, the influence of the Poisson problem and DFT aspects are eliminated by considering the Schrödinger equation for the hydrogen atom, as in Fig. 3(a). Therein, super-optimal rates are observed for most NURBS discretizations. In Fig. 4(a), the lines for N 3 to N 6 are extended to finer resolutions and the reduction of the rates towards the optimal asymptotic values is observed. Next, a separate mesh


Fig. 4. Radial setup for the hydrogen atom associated with Fig. 3(a) is revisited by (a) further refining based on chosen NURBS discretizations and (b) recomputing on chosen discretizations with a uniform mesh. The convergence rate predicted from the first three data points is indicated with a “-” and from the last three with a “+”, the latter being the default choice in all plots.

Fig. 5. Radial AE computations on lithium and aluminum atoms are summarized.

construction is employed where d1 = d2/2 is assigned for the core size and the mesh in the outer patch is not graded, effectively leading to a uniform mesh over the domain at resolution 2eo . The results in Fig. 4(b) for chosen discretizations demonstrate sub-optimal convergence rates at resolutions similar to 3(a) and the error is comparatively higher for each discretization choice. However, an increase towards optimal asymptotic values is again observed with mesh refinement. This example also demonstrates the importance of choosing the core size (d1) carefully in the mesh construction scheme adopted, because the mesh effectively acts as uniform if it is too large, leading to non-optimal convergence rates. On the other hand, the degrees of freedom in the core region are not used effectively if the core size is chosen too small. Because all nuclei/ions will be chosen to fit in the core region, the molecular geometry may by itself be a limiting factor on the smallness of this region, thereby limiting the adjustment of the mesh in order to observe optimal rates. To summarize, super- or sub-optimal convergence rates can be observed before the asymptotic behavior emerges. Note that the deviations in the slopes of the lines are visually small in most cases so that it is important to quantitatively estimate the rates instead of providing slope guidelines as a means for qualitative assessment. In practice, and in some examples to follow, the optimal rates may not be achieved because the target accuracy has already been reached or further mesh refinement is limited by the computational resource.

4.1.2. Lithium and aluminum atomsNext, two multi-electron atoms, lithium and aluminum, are considered, first in an AE setting with a core size d1 = 0.1 and

using MM2. The computed reference value is E0 = −7.335 195 186 for lithium and E0 = −241.315 573 406 for aluminum. Fig. 5 demonstrates optimal convergence rates as well as an ordering among different choices of discretization that is similar to earlier figures. As noted in Section 3.3, the multi-mesh setup where double the resolution of M(1) is employed for M(2)does not guarantee variational results. As a particular example, it is noted that on finer meshes both N 5 and N 6 deliver


Fig. 6. Radial (local) PP computations on lithium and aluminum atoms are summarized. In this example alone, LSDA was employed. L1 discretization for aluminum is marked with “m”, indicating that MM2 was needed here to avoid non-variational results.

non-variational results for aluminum (data points not shown). If triple the resolution is employed on M(2) , variational results are recovered and optimal convergence rates are retained. Consequently, the choice of resolution for the two meshes of the computational framework also influence the convergence behavior, in addition to the mesh structure for a given resolution as noted in the preceding section.

Finally, both atoms are reconsidered in a local PP setting, lithium having a single-electron valence structure and alu-minum having a multi-electron one, with a core size d1 = 3. For this purpose, the evanescent core PP [97] is employed with the individual parameters reported in [98]. In order to compare the results with the values reported in [99], classical Slater exchange is employed together with Perdew-Zunger correlation [100]. Additionally, only for this example and omitting the details, the computational framework is readily extended to incorporate the local spin density approximation (LSDA) by distinguishing between spin-up and spin-down occupancies [4]. The computed radial PP reference values in this setting (E0 = −0.219 463 558 94 for lithium and E0 = −1.975 630 118 01 for aluminum) are converged to 10−11 accuracy or better with respect to mesh refinement, and match the values in [99] to the accuracy reported therein (−5.97 eV for lithium and −53.76 eV for aluminum). Fig. 6 displays the higher accuracy attained at lower resolutions due to the PP setting and demonstrates optimal convergence rates as well as the leading efficiency of N 3 over all Lp discretizations. It is noted that MM1 was found sufficient for both cases, except for the L1 discretization for aluminum. This choice consistently led to non-variational results with MM1, despite delivering an optimal convergence rate from below, so that MM2 was employed instead. Finally, on this same setup, reference LDA total energy values have additionally been computed for subsequent use in the convergence analysis of three-dimensional single atom computations (E0 = −0.208 599 294 90 for lithium and E0 = −1.970 704 176 28 for aluminum).

The radial distributions for the aluminum atom associated with the N 6 discretization, which delivers the lowest error in total energy with the smallest number of degrees of freedom, are additionally visualized in order to demonstrate the solution quality provided by NURBS and to highlight main well-known features of the electronic structure. The AE radial functions from (2.29)2 are visualized in Fig. 7, which are resolved with only 70 degrees of freedom but deliver a total energy error of 2.79 × 10−7 despite the simple mesh construction. The corresponding s-orbitals, which are associated with Rn0, display a cusp due to the Coulombic nature of the nuclear potentials. The p-orbitals, which are associated with Rn1, do not display a cusp because the orbitals consist of the spherical harmonics in addition to the radial functions (Section 2.6). Similarly, the PP radial functions are visualized in Fig. 8, which are resolved with only 34 degrees of freedom but deliver a total energy error of 1.19 × 10−10. The PP radial functions and therefore the corresponding pseudo-orbitals do not display a cusp or an oscillation due to the nature of the ionic pseudopotential, thus allowing a coarser mesh resolution in the vicinity of the ion. Unlike the AE setting where the density monotonically increases towards the nucleus, the simplified behavior of the pseudo-orbitals causes a non-monotonic variation in the density near the ion, which will be observed in the three-dimensional examples of Section 4.2.4 as well. It is noted that the local PP chosen in this work is not norm-conserving [97], so that the PP distributions closely match the valence AE distributions away from the ion but not perfectly [72,7]. The nonlocal PP to be employed in the three-dimensional case does obey the norm-conservation condition.

4.2. Three-dimensional case

4.2.1. All-electron studies on single atomsIn order to validate the three-dimensional FEM framework, where only LDA will be employed, a series of single atom

studies are first carried out, beginning with the AE setting based on Slater exchange combined with Vosko-Wilk-Nusair correlation. The single atom will always be placed at the center of the core domain. As the first example, the radial analysis


Fig. 7. The radial AE solution for the aluminum atom is visualized, corresponding to the N 6 discretization at eo = 30 from Fig. 5(b). The orbitals have unit occupancy, except for F31 = 1/6.

Fig. 8. The radial PP solution for the aluminum atom is visualized, corresponding to the N 6 discretization at eo = 12 from Fig. 6(b). The orbital occupancy factors are F30 = 1 and F31 = 1/6. The AE density distribution is marked with “v”, indicating that it is associated with the valence orbitals alone.

summarized in Fig. 3 for the Hydrogen atom is repeated. The same choices are followed, so that the reference values are the analytical result E0 = −0.5 for the Schrödinger equation, where MM1 is employed, and the radial result E0 =−0.445 670 518 for the Kohn-Sham DFT formulation, where MM2 is needed. Fig. 9 demonstrates that observations regarding optimal convergence rates as well as the relative ordering among NURBS and Lagrange discretizaions again hold. Due to the high computational cost associated with NURBS discretizations beyond N 4 within the MM2 setup, only a representative point is displayed for N 5 and none for N 6.

Similar observations are repeated in Fig. 10 for AE computations for the multi-electron cases of lithium (d1 = 0.5) and carbon (d1 = 0.4), the former having already been considered in the radial AE setting of Fig. 5(a). In both cases, MM1 was sufficient. The radial reference values for these examples are E0 = −7.335 195 186 for lithium and E0 = −37.425 748 536for carbon. The need for a better mesh construction increases with increasing atomic number in order to ensure that the degrees of freedom are efficiently employed towards an effective numerical resolution of the variations in the potentials and the orbitals near the nucleus. Presently, the carbon atom is near the limit of the mesh construction employed with respect to an ability to deliver results towards chemical accuracy at the optimal rate.

At this point, a limited discussion of run times is provided. The fastest route to chemical accuracy in the case of Fig. 9(a) is delivered by N 3 at eo = 12 for NURBS (E − E0 = 0.490 ×10−3) and by L5 at eo = 20 for Lagrange discretizations (E − E0 =1.273 × 10−3), but the latter requires approximately twice the run time on a single processor despite delivering more than twice the error in total energy. Although N 5 delivers chemical accuracy already at eo = 8 (E − E0 = 1.429 × 10−3), the increasing integration cost with NURBS order (Section 3.2) leads to a run time that is larger than both of the choices above, approximately 50% slower than L5. For Fig. 9(b), a similar comparison is carried out where SCF iterations are now needed together with the solution of a Poisson problem at each iteration, and on a finer mesh due to the MM2 setup. Again, the fastest route to chemical accuracy is delivered by N 3 at eo = 12 for NURBS (E − E0 = 0.324 ×10−3) and by L5 at eo = 20 for Lagrange discretizations (E − E0 = 0.919 × 10−3 — data point not shown), both with the same number of SCF iterations. The former is associated with a smaller error but now also requires close to 50% more run time. If, however, MM1 is employed


Fig. 9. Three-dimensional computations on the hydrogen atom are summarized based on (a) the solution of the Schrödinger equation and (b) the AE Kohn-Sham DFT formulation.

Fig. 10. Three-dimensional AE computations on lithium and carbon atoms are summarized.

instead of MM2, the fastest choices are once again preserved (N 3 at eo = 12 with E − E0 = 0.381 × 10−3 and L5 at eo = 20with E − E0 = 1.219 × 10−3) but N 3 remains faster by about 25%. Essentially, the higher integration cost per element, in particular, together with the lower sparsity of the involved matrices impact the run times of NURBS discretizations adversely. Nevertheless, one may state that carefully chosen NURBS discretizations will not only deliver higher per-degree-of-freedom accuracy but also have the potential to do so faster (or, in competitive times) towards chemical accuracy when compared to Lagrange discretizations. Harnessing this potential requires, among others, better mesh construction as well as parallelization with good scaling, both of which are outside the scope of the present study. For FEM studies of scaling in Kohn-Sham DFT, see [29,33,14,41,49]. For locally refined mesh construction in Kohn-Sham DFT, see [12,33,14,71] for a priori prescribed meshes as in the present study, optimally graded in [41,42,49], and [34,35,40,101,70,102] for a posteriorirefinements carried out adaptively.

4.2.2. Pseudopotential studies on single atomsNext, the three-dimensional counterparts of the local PP studies in Fig. 6 on lithium and aluminum are repeated follow-

ing the same choices but based on LDA, where the relevant reference values stated in Section 4.1.2 (E0 = −0.208 599 294 90for lithium and E0 = −1.970 704 176 28 for aluminum) will be employed. The results in Fig. 11 further validate the compu-tational framework. As in the radial case, MM1 was largely sufficient for these single-atom PP studies.

The nonlocal PP formulation is additionally demonstrated for these two atoms. The underlying formulation [74] explicitly provides the local and nonlocal contributions in analytical form, describing a single-electron valence structure for lithium and a multi-electron one for aluminum as in the preceding local case. This nonlocal formulation incorporates relativistic contributions associated with spin-orbit coupling, based on an earlier nonrelativistic version [73], which are presently omit-ted by only employing the scalar parts of the PP. For earlier FEM-based applications of this nonlocal pseudopotential, see [29,13,56,39,103,57]. Moreover, for consistency with this nonlocal PP formulation, the combined exchange-correlation ex-


Fig. 11. Three-dimensional local PP computations on lithium and aluminum atoms are summarized. L1 discretization for aluminum is marked with “m”, indicating that here MM2 was needed to avoid non-variational results.

Fig. 12. Three-dimensional nonlocal PP computations on lithium and aluminum atoms are summarized. L1 discretization for aluminum is marked with “m”, indicating that here MM2 was needed to avoid non-variational results.

pression presented in [73] is chosen. The reference values are computed on a fine mesh, which is E0 = −0.189 548 163 for lithium (from L5 at eo = 24) and E0 = −1.944 031 342 for aluminum (from L6 at eo = 36). It is noted that these values are, well beyond the target PP accuracy requirement, close to comparison computations on ABINIT [104], which employs a plane wave basis, that deliver E0 = −0.189 548 for lithium (with a cell size of 40 and an energy cut-off of 30) and E0 = −1.944 031 for aluminum (with a cell size of 40 and an energy cut-off 60), both of which are converged to at least 10−6 accuracy with respect to cell size and energy cut-off. The results in Fig. 12 demonstrate pre-asymptotic super-optimal convergence rates for lithium and optimal ones for aluminum. It is noted that, in order to limit the extended coupling due to the slow decay of nonlocal contributions, a relatively large core size d1 = 6 was chosen for lithium, whereas d1 = 3 was sufficient for aluminum.

4.2.3. All-electron and pseudopotential studies on small moleculesAchieving chemical accuracy with the current mesh construction scheme on small molecules composed of hydrogen

and carbon atoms is viable in an AE setting, in view of the single atom investigations in Section 4.2.1. The hydrogen molecule (H2) is chosen as a classical study [27,12] together with the methane molecule (CH4) as a common example [34,41,105,45,102], both with an MM2 setup to avoid nonvariational results, employing a domain size of d2 = 25 and classical Slater exchange combined with Vosko-Wilk-Nusair correlation [96]. Dalton [106], which employs Gaussian basis sets, was employed to generate reference values for convergence analysis as follows:

1. Hydrogen molecule: Geometry optimization was first carried out with the cc-pV5Z basis set to obtain a bond length of ro = 1.445 821. A reference value of E0 = −1.137 845 was then computed at this bond length with the aug-cc-pV6Z basis set, which delivers an accuracy of the order of 10−6 with respect to basis set size. In finite element computations, the core size was selected to be d1 = ro/2 and the nuclei were placed at the center of two opposing faces of the


Fig. 13. Three-dimensional AE computations on the hydrogen and methane molecules are summarized. For molecules, the error in total energy per atom is monitored.

Fig. 14. The AE analysis for the methane molecule summarized in Fig. 13(b) is repeated by scaling down the core size, along with the molecule geometry that is represented by the equilibrium bond length ro , in order to obtain a more effective resolution of the Poisson and Kohn-Sham problems.

core domain. Although not pursued presently, it is remarked that a two-dimensional framework may be formulated for problems of this type, thereby significantly lowering the overall computational cost [26].

2. Methane molecule: Geometry optimization was first carried out with the 6-31G basis set to obtain an H-C bond length of ro = 2.081 731. The computed H-C-H bond angle was modified towards the ideal value acos(−1/3) that is associated with a perfect tetrahedron geometry. This modification matches the computed value to 10−3 accuracy in degrees and causes negligible error in the total energy, which was computed with an aug-cc-pV6Z basis set and found to be E0 =−40.121 8 with an accuracy of the order of 10−4 with respect to basis set size. In finite element computations, the core size was selected to be d1 = ro/

√3 due to the idealized geometry and the carbon atom was placed at the center of the

core domain whereas the hydrogen atoms were placed at four corners.

Results in Fig. 13 reiterate earlier findings, except for non-optimal rates in the methane example. It is noted that chemical accuracy per atom is easily met for the hydrogen molecule (E = −1.137 844 435 from L6 at eo = 48) — data point not shown) whereas it was achieved only with L6 at eo = 60 (E = −40.115 742 804) for the methane molecule due to the simple mesh construction scheme which limited the accuracy and the convergence rates. It is important to recall that a convergence rate that is independent of the discretization order, as observed in Fig. 13(b), is not surprising because it has been demonstrated early in the development of FEM that uniform meshes lead to such an observation in the presence of singularities [107] whereas appropriate grading can help recover the optimal rates [108]. In order to further examine this limitation in view of this fact, also recalling the discussion of Section 4.1.1, the core size was reduced and the methane bond length was contracted accordingly towards non-equilibrium values in order to fit the molecule geometry to the mesh construction. Again employing the aug-cc-pV6Z basis set, the total energy was computed to be −37.635 2 for a bond length of bo/2 and −31.783 4 for a bond length of bo/3. The analysis in Fig. 14 demonstrates that, for N 2 and N 3 cases as representative discretizations, not only is chemical accuracy achieved with NURBS discretizations but optimal rates are also recovered with decreasing size due to a more effective resolution of the Poisson and Kohn-Sham problems. This

journal of computational physicsyoksis.bilkent.edu.tr/pdf/files/14808.pdfjournal of computational...

Documents