document

6
Scientia Iranica C (2013) 20 (3), 543–548 Sharif University of Technology Scientia Iranica Transactions C: Chemistry and Chemical Engineering www.sciencedirect.com Modeling intermolecular potential of He–F 2 dimer from symmetry-adapted perturbation theory using multi-gene genetic programming M. Amiri a,, M. Eftekhari b , M. Dehestani a , A. Tajaddini c a Department of Chemistry, Shahid Bahonar University of Kerman, Kerman, P.O. Box: 7616913, Iran b Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, P. Code: 7616913111, Iran c Department of Mathmatics, Shahid Bahonar University of Kerman, Kerman, P.O. Box: 76169-14111, Iran Received 16 April 2012; revised 18 August 2012; accepted 31 December 2012 KEYWORDS Potential energy; SAPT; MGGP; Lennard-Jones potential. Abstract Any molecular dynamical calculation requires a precise knowledge of interaction potential as an input. In an appropriate form, such that the potential, with respect to the coordinates, can be evaluated easily and accurately at arbitrary geometries (in our study parameters for geometry are R and θ ), a good potential energy expression can offer the exact intermolecular behavior of systems. There are many methods to create mathematical expressions for the potential energy. In this study for the first time, we utilized the Multi-gene Genetic Programming (MGGP) method to generate a potential energy model for the He–F 2 system. The MGGP method is one of the most powerful methods used for non-linear regression problems. A dataset of size 714 created by the SAPT 2008 program is used to generate models of MGGP. The results obtained show the power of MGGP for producing an efficient nonlinear regression model, in terms of accuracy and complexity. © 2013 Sharif University of Technology. Production and hosting by Elsevier B.V. All rights reserved. 1. Introduction Potential Energy (PE) is the energy stored in a molecule. This energy also is the portion of the energy of a system which is associated with its position in a force field [1]. Usually, in chemistry, when we talk about PE, it is related to the energy that was created because of the configuration of the molecule. When we have two parts, like our system (He–F 2 ), PE will change the change of geometry parameters (R,θ), or, in other words, with a change in configuration. The most stable configuration has the smallest potential energy. Our knowledge about the most stable configuration is very essential. For example, to Corresponding author. Tel.: +98 341 3202 106; fax: +98 341 3222 033. E-mail addresses: [email protected] (M. Amiri), [email protected] (M. Eftekhari), [email protected] (M. Dehestani), [email protected] (A. Tajaddini). propose a good mechanism for chemistry reactions we must have correct information about changes of potential energy in order to change configuration. An accurate model of PE is necessary in scattering calcula- tion, and calculation of the second virial coefficients, etc. For ex- ample, to calculate second virial coefficients (B 12 (T )), we must evaluate the following integral: B 12 (T ) = π N A 0 π 0 {1 exp[−V (R,θ)/kT ]} × R 2 sin θ dRdθ, (1) where N A is Avogadro’s number and k is Boltzmann’s constant. V (R,θ) is potential energy expression. In the present study, we obtained several models for the potential energy of the He–F 2 system by the MGGP method. MGGP is a promising variant of genetic programming (GP). Multi-gene Genetic Programming (MGGP) effectively combines the model structure selection ability of the standard GP with the parameter estimation power of classical regression to capture nonlinear interactions [2]. In order to achieve this purpose, 714 data, collected by the SAPT 2008 program, are used. SAPT2008 Peer review under responsibility of Sharif University of Technology. 1026-3098 © 2013 Sharif University of Technology. Production and hosting by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.scient.2012.12.040

Upload: dinhthuy

Post on 01-Jan-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: document

Scientia Iranica C (2013) 20 (3), 543–548

Sharif University of Technology

Scientia IranicaTransactions C: Chemistry and Chemical Engineering

www.sciencedirect.com

Modeling intermolecular potential of He–F2 dimer fromsymmetry-adapted perturbation theory using multi-genegenetic programmingM. Amiri a,∗, M. Eftekhari b, M. Dehestani a, A. Tajaddini caDepartment of Chemistry, Shahid Bahonar University of Kerman, Kerman, P.O. Box: 7616913, IranbDepartment of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, P. Code: 7616913111, IrancDepartment of Mathmatics, Shahid Bahonar University of Kerman, Kerman, P.O. Box: 76169-14111, Iran

Received 16 April 2012; revised 18 August 2012; accepted 31 December 2012

KEYWORDSPotential energy;SAPT;MGGP;Lennard-Jones potential.

Abstract Any molecular dynamical calculation requires a precise knowledge of interaction potential asan input. In an appropriate form, such that the potential, with respect to the coordinates, can be evaluatedeasily and accurately at arbitrary geometries (in our study parameters for geometry are R and θ ), a goodpotential energy expression can offer the exact intermolecular behavior of systems. There are manymethods to create mathematical expressions for the potential energy. In this study for the first time, weutilized the Multi-gene Genetic Programming (MGGP) method to generate a potential energy model forthe He–F2 system. The MGGPmethod is one of the most powerful methods used for non-linear regressionproblems. A dataset of size 714 created by the SAPT 2008 program is used to generate models of MGGP.The results obtained show the power of MGGP for producing an efficient nonlinear regression model, interms of accuracy and complexity.

© 2013 Sharif University of Technology. Production and hosting by Elsevier B.V. All rights reserved.

1. Introduction

Potential Energy (PE) is the energy stored in a molecule.This energy also is the portion of the energy of a system whichis associated with its position in a force field [1]. Usually, inchemistry,whenwe talk about PE, it is related to the energy thatwas created because of the configuration of themolecule.Whenwe have two parts, like our system (He–F2), PE will changethe change of geometry parameters (R, θ), or, in other words,with a change in configuration. The most stable configurationhas the smallest potential energy. Our knowledge about themost stable configuration is very essential. For example, to

∗ Corresponding author. Tel.: +98 341 3202 106; fax: +98 341 3222 033.E-mail addresses:[email protected] (M. Amiri),

[email protected] (M. Eftekhari), [email protected](M. Dehestani), [email protected] (A. Tajaddini).Peer review under responsibility of Sharif University of Technology.

1026-3098© 2013 Sharif University of Technology. Production and hosting by Els

http://dx.doi.org/10.1016/j.scient.2012.12.040

propose a good mechanism for chemistry reactions we musthave correct information about changes of potential energy inorder to change configuration.

An accurate model of PE is necessary in scattering calcula-tion, and calculation of the second virial coefficients, etc. For ex-ample, to calculate second virial coefficients (B12(T )), we mustevaluate the following integral:

B12(T ) = πNA

0

π

0{1 − exp[−V (R, θ)/kT ]}

× R2 sin θdRdθ, (1)

where NA is Avogadro’s number and k is Boltzmann’s constant.V (R, θ) is potential energy expression.

In the present study, we obtained several models for thepotential energy of the He–F2 system by the MGGP method.MGGP is a promising variant of genetic programming (GP).Multi-gene Genetic Programming (MGGP) effectively combinesthemodel structure selection ability of the standardGPwith theparameter estimation power of classical regression to capturenonlinear interactions [2]. In order to achieve this purpose, 714data, collected by the SAPT 2008 program, are used. SAPT2008

evier B.V. All rights reserved.

Page 2: document

544 M. Amiri et al. / Scientia Iranica, Transactions C: Chemistry and Chemical Engineering 20 (2013) 543–548

Figure 1: The potential energy for Ar–Ar system.

is a computer program implementing the many-body versionof the symmetry-adapted perturbation theory (SAPT). SAPT isdesigned to calculate the interaction energy of a dimer, or asystem consisting of two arbitrary monomers. Each monomercan be an atom or a molecule [3]. Obtained equations aresuitable models in terms of both accuracy and simplicity.

2. Potential energy and computational details

2.1. Potential energy with one variable

Figure 1 shows the typical behavior of PE as a function ofr (the intermolecular distance for a diatom like Ar–Ar) nearbyr = re (re or requilibrium is the intermolecular distance, whosepotential energy, at this distance, has the smallest energy). Thecurve is similar to a parabola, and, therefore, it is representedby the following equation:

V = 1/2kx2, (2)

where x = r − re and k = (d2V/dr2)r=re . Eq. (2) is thefamiliar harmonic oscillator potential. The harmonic oscillatorpotential cannot be used formodeling the interactions betweeninternal atoms of a molecule and an external atom, because, asindicated in Figure 1, with an increase in r , the behavior of thepotential energy does not show the behavior of the harmonicoscillator. For this reason, we have to add cubic, quarter andother higher order terms to the Eq. (2). The resulted equationis an anharmonic oscillator potential.

The general function is as follows [4]:

V (x) = d0x21 +

∞i=1

dixi

, (3)

where x = r − re.

2.1.1. Lennard-Jones potentialA simple but practical function for estimating the interaction

energy between two neutral atoms ormolecules is the Lennard-Jones potential, as [1]:

V (R) = 4ε[(σ/R)m − (σ/R)n], (4)

where constants ε and σ are the depth of potential energycurve and the distance at which the potential energy vanishes,respectively, as shown in Figure 1. There is a small flexibility

Figure 2: The coordinate system for He–F2 .

in the functional form, except when the powers m and n are12 and 6, respectively. This model has a minimum number ofparameters to show the PE curve. Because there are not enoughparameters to adequately reproduce an exact potential, theLennard-Jones potential is not a precise model [4]. One actionundertaken in this study is to adjust the Lennard-Jones potentialfor creating new and exact models for our collected data.

2.2. Potential energy with two variables

Introducing the PE expression for a system with twovariables is much more complicated than a system with onevariable. Our system, He–F2, has two variables. An atom (He)+ diatomic (F–F) is like a linear rigid rotor. Figure 2 shows thecoordinate system for the He–F2.

The PE model, V (R, θ), is expanded in terms of Legendrepolynomials (Pλ):

V (R, θ) =

λ

Vλ(R)Pλ(cos θ). (5)

Coefficient Vλ is dependent only on R [4].

3. Computational details

The coordinate system for the He–F2 compound is shown inFigure 2, where R is the intermolecular distance between thecenter mass of F2 and He, r is the F–F bond length and θ is theangle between R and r .

We used 714 data points in the following order to makemodels for the potential energy surface:

R = 1.6(0.25)2, 2(0.1)3, 3.05, 3.1, 3.15, 3.2, 3.38, 3.4, 3.6,3.8, 4, 4.5, 5, 6 Å,

θ = 0°(5)90°, 57.5°, 87.5.

For the first time, we calculated potential energy (V (R, θ)) bythe SAPT2008 program [3] and cc-pVQZ-F12 is used as a basisset [5]. In a previous study, potential energy was calculated bythe super molecular approach [6], which has less accuracy incomparison with the symmetry-adapted perturbation theory(SAPT) approach.

The resulted energies in this research (calculated bySAPT2008) and in two previous research works, which used asuper molecular approach, are compared in Table 1.

Page 3: document

M. Amiri et al. / Scientia Iranica, Transactions C: Chemistry and Chemical Engineering 20 (2013) 543–548 545

Figure 3: Dependence of the potential energy surface of the He–F2 on the intermonomer distance R at r = 1.412 Å, θ = 0° and θ = 90°.

Figure 4: Tree structure of the expression (a − b) + (c × d).

Table 1: Potential energies He–F2 in comparison with two previous worksat r = 1.412 Å. All energies are in micro hartrees.

θ = 0° θ = 90°R (Å) Our work Chan

et al. [6]Our work Chan

et al. [6]

1.60 301688.28 – 31008.62 –2.00 72115.92 – 5693.13 –2.40 12443.73 – 737.20 –2.80 1505.57 – −37.03 –3.00 347.90 314.80 −91.15 −145.703.25 −75.47 −108.00 −89.83 −124.503.50 −137.38 −162.30 −67.65 −91.703.75 −114.23 −131.40 −47.75 −64.204.00 −82.04 −92.80 −33.20 −44.504.50 −38.53 −42.80 −17.09 −21.905.00 −18.79 −20.60 −9.03 −11.506.00 −5.48 −6.00 −2.98 −3.70

Figure 3 shows the potential energy curve in linear(θ = 0) and T-shaped configurations. It can be seen that linearconfiguration has less energy, therefore, is more stable than T-shaped configuration.

4. Genetic programming

Genetic Programming (GP), which was first offered byKoza [7], is a biologically inspired machine learning method.GP and Genetic Algorithm (GA) are very similar to each other.In GP, at first, a population of computer programs which isrepresented by a tree structure (Figure 4) is generated, thenmutation and crossover are done on the best selected treesto create a new population, this process is repeated until amaximum number of generations is obtained. The GP methodis usually called symbolic regression [8]. Figure 5 shows thisprocess as a flowchart.

Figure 4 shows the structure of a tree which exhibits themathematical expression (a − b) + (c × d). Trees are flexiblestructures through which logical expressions or mathematicalrelations could be appropriately shown. Leaves of the treeusually specify variables or constants and are chosen from apre-defined terminal set, while other nodes specify operatorsor functions and are chosen from a pre-defined function set. Inthe tree of Figure 4, a, b, c, d and +, −, × are used as variablesand operators, respectively. The division operator and variablee are not used in the tree. Pre-defined sets are as follows:function set = {+, −, ×, /}, (6)terminal set = {a, b, c, d, e}. (7)

4.1. Multi-gene GP

In this work, Multi-Gene GP (MGGP), one newly developedversion of GP, is utilized for performing efficient modelingvia symbolic regression. In Multi-gene GP, each chromosomeconsists of some ‘‘genes’’ and, thereby, eachmodel is aweightedlinear combination of these genes. Each of the genes is astandard GP tree.

In Figure 6 there are twogenes orGP trees, x1 and x2 are inputvariables. y is a linear combination of two genes, and d0, d1 andd2 are the weights that are obtained by least squares. In thepresent study, we used GPTIPS (a toolbox written for Matlab)for obtaining one efficient regression model.

5. Simulation results

5.1. Parameters setup

In the present study, we used GPTIPS to create a mathe-matical expression for the PE of the He–F2 system. Parame-ters of algorithms are as follows. The number of generations,G = 300, size of population, Popsize = 200, x-over prob-ability, Pc = 0.7, and mutation probability, Pm = 0.01.Also, function set = {+, −, /, ∧, sin, sinh, tan, tanh, cos}, andterminal set = {R, θ}.

5.2. Results and discussion

The produced expression by Multi-gene GP is given in thefollowing:V (R, θ) = −((24.13 tan(tanh(R2))) + (24.13(R + θ1/4))

× ((0.4739θ) − cos(R) + 12.57))/ sinh(R2)

− (1091|θ − 0.2231|1/2) tan(sinh(sin(θ1/2)))

× (sin(θ1/2) + 0.5119)/(5 sinh(R2)) + 0.6577+ ((188000 tanh((0.2813θ) + 1.6062)× (tanh((0.0325θ) + sinh(R))

− 0.6309)))/ sinh(R2). (8)The correlation coefficient (R2) for the obtained model is0.99992. Figure 7 depicts the real data (resulted from SAPTprogram) versus the predicted data.

5.3. Adjusting Lennard-Jones parameters

In this study, also, we adjusted and optimized Lennard-Jonespotential’s parameters for our system.

At first, we tried to obtain relations for m and n in Lennard-Jones potential equation. Parameters of algorithms were set asfollows. The number of generations, G = 320, size of popula-tion, Popsize = 220, x-over probability, Pc = 0.7, and muta-tion probability, Pm = 0.01. Also, function set = {+, ×, sin,

Page 4: document

546 M. Amiri et al. / Scientia Iranica, Transactions C: Chemistry and Chemical Engineering 20 (2013) 543–548

Figure 5: Flowchart of GA and GP algorithms.

Figure 6: A typical multiple gene model.

sinh, tan, tanh, cos}, and terminal set = {R, θ}. In the follow-ing, the derived relations form and n are given, respectively:m = 6.48, (9)n = 3.2R + 5.164, (10)and the modified Lennard-Jones equation is represented as :

V (R, θ) = 4ε(θ)[(σ (θ)/R)6.48 − (σ (θ)/R)3.2R+5.164]. (11)

Values of ε(θ) and σ(θ) for some angle are listed in Table 2.Figure 8 shows the accuracy of predictions of Eq. (11), with

regard to real values (ESAPT).As the third idea, we tried to achieve an accurate equation

by adjusting four parameters of the Lennard-Jones potentialequation, namely; m, n, ε and σ . The results are given in thefollowing:m = cos(4.997θ) + 5.452, (12)n = 2.0θ + cos(9.994R), (13)ε and σ are constant values, 4.997 and 6.431, respectively.Accordingly, the final equation is represented as:V (R, θ) = 4(4.997)[(6.431/R)cos(4.997θ)+5.452

− (6.431/R)2.0θ+cos(9.994R)]. (14)

Table 2: Values of two parameters, ε(θ) and σ(θ).

θ (°) ε (meV) σ (Å)

0 −3.59 3.1735 −2.01 3.2845 −1.75 3.2650 −1.71 3.2570 −1.88 2.9885 −2.50 2.7790 −2.58 2.72

Parameters of the algorithm are as follows. The number ofgenerations, G = 320, size of population, Popsize = 220, x-overprobability, Pc = 0.7, and mutation probability, Pm = 0.01.Also, function set = {+, ×, cos}, and terminal set = {R, θ}.

The accuracy of Eq. (14), with regard to real values (ESAPT), isshown in Figure 9.

6. Second virial coefficients

The second virial coefficients were calculated from thecalculated potential energy Eq. (8), and Gauss and Ramberg

Page 5: document

M. Amiri et al. / Scientia Iranica, Transactions C: Chemistry and Chemical Engineering 20 (2013) 543–548 547

Figure 7: Predicted potential energies using Eq. (8) versus values computed bySAPT program.

Figure 8: Predicted potential energies using Eq. (11) versus values computedby SAPT program.

Figure 9: Predicted potential energies using Eq. (14) versus values computedby SAPT program.

methods [9] are used for evaluating the integral (Eq. (1)). Theresults are shown in Figure 10. The figure for Eq. (13) is similar

Figure 10: Second virial coefficients resulted of Eq. (8).

to Eq. (8). Unfortunately, there are no second virial coefficientsin previous work to compare with our results.

7. Conclusion

This study can be divided into two parts, generally. First,chemistry computation, inwhichwe applied the SAPT approachusing the SAPT2008 program to achieve potential energies,He–F2, for the first time. The SAPT method has good accuracyin comparison with the super molecular method used byprevious researchers [6]. In the second step, to create modelsfor achieved energies in the previous part, the MGGP methodwas utilized. In order to create simple and precise models, theMGGP was employed, both for modeling PE Eq. (8) and foradjusting the Lennard-Jones potential (Eqs. (11) and (14)). Itis reasonable that a simple equation is more desirable than acomplex one because of its application; for example, calculationof the second virial coefficients.

References

[1] Lide, D.R., Handbook of Chemistry and Physics, 84th Edn., pp. 2–55, CRC, BocaRaton (2004).

[2] Gandomi, A.H. and Alavi, A.H. ‘‘A new multi-gene genetic programmingapproach to nonlinear system modeling. Part I: materials and structuralengineering problems’’, Neural Computing & Applications, 21, pp. 171–187(2012).

[3] Bukowski, R., Cencek, W., Jankowski, P., Jeziorski, B., Jeziorska, M.,Kucharski, S.A., Lotrich, V.F., Misquitta, A.J., Moszynski, R., Patkowski, K.,Podeszwa, R., Rybak, S., Szalewicz, K.,Williams,H.L.,Wheatley, R.J.,Wormer,P.E.S. and Zuchowski, P.S. ‘‘SAPT 2008: AnAb Initio Program forMany–BodySymmetry – Adapted Perturbation Theory Calculations of IntermolecularInteraction energies’’ (2008).

[4] Sathyamurthy, N. ‘‘Computational fiting of ab initio potential energysurfaces’’, Computer Physics Reports, 3, pp. 1–70 (1985).

[5] Bischoff, A.F.,Wolfsegger, S., Tew, D.P. and Klopper,W. ‘‘Assessment of basissets for F12 explicitly-correlated molecular electronic-structure methods’’,Molecular Physics, 107, pp. 963–975 (2009).

[6] Chan, K.W., Power, T.D., Jai-nhuknan, J. and Cybulski, S. ‘‘An ab initio studyof He–F2 , Ne–F2 , and Ar–F2 van der Waals complexes’’, Journal of ChemicalPhysics, 110, pp. 860–869 (1999).

[7] Koza, J.R., Genetic Programming: On the Programming of Computers by Meansof Natural Selection, 2nd Edn., MIT, Massachusetts, USA (1992).

[8] Searson, D.P., Leahy, D.E. and Willis, M.J. ‘‘GPTIPS: An open source geneticprogramming toolbox for multigene symbolic regression’’, 1st Int. Conf. onMulticonference of Engineers andComputer Scientists, 1, HongKong, China,pp. 77–80 (2010).

[9] Burden, R.L. and Faires, J.D. ‘‘Numerical analysis’’, In Introduction to Theoryand Application of Modern Numerical Approximation Techniques, 7th Edn.,pp. 207–211, Brooks Cole, California, USA (2001).

Mohammad Amiri was born in Kerman, Iran, in 1985. He received hisB.S. degree in Pure Chemistry and M.S. degree in Physical Chemistry fromShahid Bahonar University, Kerman, Iran, in 2008 and 2011, respectively.His research interests include: computational chemistry, especially modelingand optimization, fuzzy logic and it’s applications, and new methods foroptimization, such as neural networks, genetic algorithms and GPTIPS.

Page 6: document

548 M. Amiri et al. / Scientia Iranica, Transactions C: Chemistry and Chemical Engineering 20 (2013) 543–548

Mahdi Eftekhari was born in Kerman, Iran, in 1978. He received a B.S. degreein Computer Engineering in 2001 and his M.S. and Ph.D. degrees in ArtificialIntelligence from Shiraz University, Iran, in 2004 and 2008, respectively. Hehas been faculty member of the Computer Engineering Department at ShahidBahonar University of Kerman, Iran, since 2008. His research interests include:fuzzy systems and modeling, evolutionary algorithms, data mining, machinelearning and application of intelligent methods in bioinformatics. He is authorand co-author of about 30 papers in cited journals and conferences, and amember of the Iranian Society of Fuzzy Systems.

Maryam Dehestani was born in Kerman, Iran, in 1967. She received a B.S.degree in Chemistry from Shahid Bahonar University, Kerman, Iran, in 1988,and M.S. and Ph.D. degrees in Physical Chemistry from the University for

Teacher Training, Iran, in 1991 and 2001, respectively. She has been a facultymember of the Chemistry Department at Shahid Bahonar University of Kerman,Iran, since 2002. His research interests include: quantum chemistry, molecularspectroscopy, computational chemistry and nanostructures calculations. He isthe author and co-author of about 90 papers in cited journals and conferences.

Azita Tajaddiniwas born in Kerman, Iran, in 1974. She received a B.S. degree inApplied Mathematics from Vali-e-Asr University, Rafsanjan, Iran, in 1995, andM.S. and Ph.D. degrees in AppliedMathematics fromShahid BahonarUniversity,Kerman, Iran, in 1997 and 2008, respectively, where she is currently facultymember in the Mathematics Department. Her research interests include:inverse eigen value problem and numerical linear algebra. She is author andco-author of 6 papers in cited journals.