accurate quantum chemistry via machine-learning and

40
7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Multi-Timescale Modeling and Quantum Chemistry using Machine-Learning Methods via Genetic Programs MCC Internal Review University of Illinois 7 November 2005 Faculty: Duane D. Johnson (MSE), Pascal Bellon (MSE), David Goldberg (GE), Todd Martinez (Chemistry) Students: Kumara Sastry (MSE/GE), Alexis Thompson (Chemistry), Jia Ye (MSE) See also Poster

Upload: butest

Post on 12-Jul-2015

607 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Multi-Timescale Modeling and Quantum Chemistry using Machine-Learning Methods via Genetic Programs

MCC Internal Review University of Illinois

7 November 2005

Faculty:Duane D. Johnson (MSE), Pascal Bellon (MSE), David Goldberg (GE), Todd Martinez (Chemistry)

Students: Kumara Sastry (MSE/GE), Alexis Thompson (Chemistry), Jia Ye (MSE)

See also Poster

Page 2: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Background on Multi-Time-Scale Modeling

Growing interest in multi-timescale modeling Restrictive and do not yield required speed-up

Hyperdynamics, Parallel replica (Voter, 1997,1998) Focus on infrequent events Use transition state theory

MD + KMC (Jacobsen, Cooper, & Sethna, 1998) Elemental metals, tabulated activation barriers

Hybridize MD & KMC using genetic programming Calculate some activation barriers using MD Predict others using GP

Page 3: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Objective for Multi-Timescale Kinetic Modeling

Can we simulate experimental time scales for dynamics (up to seconds) for designing nanostuctured functional materials?–Time of realistic processes requires atomic-scale information, need frequent

events (pico-secs) to rare events (secs).–Infeasible to compute, e.g., barriers a priori or “on-the-fly”.– Possible configurations become potentially innumerable.– relative barrier heights control access and diffusion.

Propose a novel, effective & practical method based on Genetic Programming (GP) for the intelligent machine learning of vast number of barrier values.

Offer Proof-of-Concept for long-time atomic diffusion in alloy – a hybrid of MD: nanoseconds 10–9 secs and (Kinetic MC): seconds.

–Use MD to get some diffusion barriers.–Use KMC to span 15 orders of time!, but need all barriers.–Use GP to regress all barriers from some barrier info.–Savings compared to Table Look-up is 4-8+ orders of magnitude.

Page 4: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Time evolution of realistic processes requires atomic-scale data, but scales inaccessible via atomic simulations (only nano-seconds!).

Ex: Vacancy-assisted Migration at Cu50Co50 (001) Surface using Kinetic Monte Carlo

• Simulate seconds from KMC but need all barriers! (2nd n.n.: 8192 barriers!)

• Machine-learn barriers as regressed in-line function E(c0,x) from few barriers .

• GP needs < 3% of the barriers for < 0.1% error!

• CPU savings (compared to Database look-up) is 4-8+ orders of magnitude.

RESULT: Time Multiscaling to Seconds using Genetic Programming

n.n. jumps:1st 2nd

database

K. Sastry, DD Johnson, DE Golderg, P. Bellon, Phys. Rev. B 72, 085438 (2005).

Chosen by AIP editors as frontier research for the Virtual J. of Nanoscience (Aug, 2005)

Page 5: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Objective for Using Multi-Objective Genetic Algorithms in Quantum Chemistry

Can we utilize concepts from multi-objective optimization theory (Pareto fronts) to create ab initio accurate semiemprical quantum chemistry potentials to dramatic speed-up searches over reaction pathways?

Offer Proof-of-Concept for Benzene and Ethylene. Future: Propose using Genetic Programming (GP) to

machine-learn semiempirical potential form to improve reliability and speed.

Page 6: Accurate Quantum Chemistry via Machine-Learning and

GA-MNDO-PM3

CASPT2

RESULT: S1/S2 Conical Intersectionfrom Machine-Learned MNDO vs. ab initio CASSCF

Reaction Coordinate

0.0 0.2 0.4 0.6 0.8 1.0

Relative Energy(ev)

0

2

4

6

8

10 FC → S2/S1 Intersection

Dashed lines = target CASPT2 results

only values/gradients at x=0 included in GA fitting

Minimal Energy Intersections – Expected to play a prominent role in excited-state chemistry (nonadiabatic transitions).

Red and Blue are g/h vectors – displacements which lift electronic degeneracy.

Benzene

Page 7: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Genetic Programming Optimization forMulti-Timescale Modeling

Kinetic Diffusion.

Cu-Co Vacancy-Assisted Surface Migration

Page 8: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

IlliGAL at University of Illinois Urbana-Champaignhttp://www-illigal.ge.uiuc.edu/

Studying nature's search algorithm of choice, genetics and evolution, as a practical approach to solving difficult problems on a computer.

The mechanics of a genetic algorithm (GA) are conceptually simple: (1) maintain a population of solutions coded as chromosomes, (2) select the better solutions for recombination (crossover) of mating

chromosomes.(3) perform mutation and other variation operators on the chromosomes, and (4) use these offspring to replace poorer solutions or to create a new generation.

Theory and empirical results demonstrate that GAs lead to improved solutions in many problem domains, and well-designed GAs can be guaranteed to solve a broad class of provably hard problems, quickly, reliably, and accurately.

David Goldberg (General Engineering)

Page 9: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

If gradient is numerically precise and fast, use gradient algorithm. If there is an exact ground state, do not use genetic algorithm.

GA Advantages • no need for knowledge or gradient info about object or energy surface.• discontinuities present on surface have little effect on optimization. • resistant to becoming trapped in local minima. • work well on large-scale optimization problems. • can be used on a wide variety of problems.

GA Disadvantages • trouble finding exact global minimum. • require a large number of cost function evaluations. • Starting/setting up configurations is not straightforward.• GAs require more evaluations to move uphill.

Generally, using GAs depends on Problem

Page 10: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Recall Concept of Genetic Algorithms

Search based on principles of natural selection and genetics

Gene encode solution: e.g., binary xn =0 or 1 for a variable where possible solutions are “gene” sequence {x1,…xN}.

Fitness (Objective) function: Quality measure of sequence. Need known function to minimize (or maximize) to evaluate quality of

gene sequence, e.g., min. f(x) = | “cost” + “constraints”|, or max. f–1(x).

Population: A set of candidate gene sequences (solutions).

Genetic operators: Selection: “Survival of the fittest” Recombination: Combine parental traits to create offspring Mutation: Modify an offspring slightly (local gene sequence change)

Page 11: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Example Use of Gas for RegressionIan Walmsley and Herschel Rabitz, “Quantum Physics Under Control,” Physics Today, August 2003.

Example: Controlling laser shape pulses to increase yields in molecular reactions.

• Controller is based on a GA to shape (phase and amplitude) pulses.

• GA evolves trying to maximize mass spectrometer signal of desired molecular species.• thousands of pulse updated per second.

R.J. Levis and H. Rabitz, J. Phys. Chem. A 106, 6427 (2002).

Ψ ~ ckeikr−ωt∑

Page 12: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Same Concept for Genetic Programming Genetic Algorithms that evolve computer programs (Koza, 1992)

Representation: Programs are represented by trees Functions: Internal nodes (eg., {+, –, *, sin, cosh, log}) Terminals: Leaf nodes (eg., {x, y, 2.3, R})

R=random ephemeral constant.

Fitness function: Quality measure of the program

Population: Candidate programs (individuals)

Genetic operators: Selection: “Survival of the fittest” Recombination: Combine parental traits to create offspring Mutation: Modify an offspring slightly

Page 13: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Example GP Function and Tree

Define Functions: { x, y, z, sin, +, *, ^, ADF1, …}

ADF “automatically defined functions” can be learned.

Function Example: Add

Double f_add (arg1, arg2)

{ return arg1 + arg2;

}

Example TREE:

where x,y,z ∈ U=[a,b]

Internal node=fct

Leaf node

Page 14: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Genetic Programming (GP)

Page 15: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Efficient Coupling of MD and KMC Multi-timescale modeling of alloys Predict entire potential energy (PE) surface

using a few exact PE calculations PE surface =

Don’t know the functional from of f !

e.g., not simple basis, could be product of plane-wave & polynomial

Use symbolic functional regression via GP Search for the regression function, f

Optimize the values of coefficients, co

Form of intelligent machine learning (like Bayesian, neural nets, etc.)

Approach for Surface Diffusion

CoefficientsTotal alloy configuration

http://gold.cchem.berkeley.edu:8080/research_path.html

Page 16: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

GP for Predicting PE Surface Encode: Alloy configuration ⇒ unique number

Decision variables: Function set: {+, -, *, /, exp, sin, ^} Terminal set:

Binary A-B alloy: xi = {0,1}: xi is (is not) A for neighbors (e.g.,

1st and 2nd) occupation for vacancy/atom pair costing ∆E.

Atomistic: Compute ∆E for some subset of configurations

Object Function: Minimize absolute error in prediction

Energy barrier predicted by GP Weights (More importance to lower energy barriers)

Page 17: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Molecular Dynamics: F = ma Real dynamics but for short times (~10–9 secs). Many realistic processes are inaccessible. Must run long. But can calculate anything.

Monte Carlo: Sampling, “dumb and blind” method. Acceptance probability pi,f = min[1, exp(–∆Ei,f/kBT)]

Need to calculation each ∆Ei,f = barrier height to change states. Time-evolution: not real time (Monte Carlo step, MCS), unless MD or

experiment has provide relation of MCS to real time for all events.

Kinetic Monte Carlo: assumes Poisson process, know all ∆Ei,f. Hence frequency of events (rate) relative to most frequent (known) event

with smallest time span ∆tshortest, e.g. MD or experiment. KMC (~secs) but need all jump frequencies a priori. KMC steps in REAL TIME and ALL EVENTS accepted!

Quick Overview of the MD, MC, KMC Methods

Page 18: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Generic Time Enhancements of Table KMC

GP-predicted PES facilitates use of kinetic Monte Carlo

Real time in KMC (Fichthorn & Weinberg, 1991)

Speed-up over MD: ~109 at 300 K ~105 at 550 K ~103 at 900 K

Less CPU time over MD

Page 19: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Potential Energy Surface: Vacancy-assisted Migration for fcc Elemental and Binary Alloy

Computational Method Potentials: – Empirical: Morse – Quantum: Tight-binding System size: 5 layers, >>100 atoms/layer

Consider 1st and 2nd nearest-neighbor (n.n.) jumps

Local (active) configs.: 1st and 2nd n.n. environments

Consider rigid (fully relaxed) atoms to calculate ∆E

Energy Test: pure Cu, n.n. jumps only Morse potential: ∆E = 0.39 eV (present work) ab initio : ∆E=0.42±0.08 eV EAM : ∆E=0.47±0.05 eV TB : ∆E=0.45±0.05 eV

Complex case: Segregating fcc CuxCo1-x

xy

n.n. jumps:1st 2nd

xz

Fixed layers

Co Cu Vacancy

1st 2ndn.n. configs.:

}Boisvert & Lewis, Phys. Rev. B. 56 (1997)(Present work)

1st n.n. 2nd n.n.1st n.n. configs. 128 1282nd n.n. configs. 2048 8192Total configs. >>2100 >>2100

JumpsActive

Page 20: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

GP Optimized Regression for PE Surface: (001) Vacancy-assisted Migration

The Machine-Learned In-Line Barrier Fct.

Page 21: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

GP Optimized Regression for PE Surface: (001) Vacancy-assisted Migration

While Non-Linear Function is complicated, you do not care – give accurate barriers, otherwise it has no meaning!

Page 22: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

PES Predictions: (001) Vacancy-assisted Migration

1st n.n. active configuration (128 total) for simplicity Atoms are either rigid or fully relaxed. Simple regression fails: Quadratic (Cubic) Polynomial

Regression needs 27% (78%) of the configurations

GP needs PE calculation for only 20 configurations (or 16%)

Page 23: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Total 2nd n.n. active configurations: 8192

GP needs (∆E calculated): < 3% (256) configurations

Low energy migrations: < 0.1% prediction error

Overall events: < 1% error

PES Predictions: (001) Vacancy-assisted Migration

Page 24: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

From a few (as needed) exact calculations use symbolic functional regression via GP to predict the entire PE surface from an in-line function f(c0,x).

Search for the regression function, f , optimize the values of coefficients, co

Form of intelligent machine learning (like Bayesian, neural nets, etc.)

Symbolically-Regressed KMC (sr-KMC)

Page 25: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Time Enhancements from sr-KMC over standard Table KMC

GP needs (∆E calculated):

– < 3% (256) configurations (33 times fewer barrier).

– Using Cluster-expansion techniques 0.3% (330 times fewer)!

Low energy migrations: < 0.1% prediction error.

GP yields in-line barrier function: ~100 x faster than table look-up.

Compared to “on-the-fly” calculations, sr-KMC is 104 -107 faster!

–in-line function call ~10–3 secs per barrier.

– Empirical potential ~10 secs per barrier.

– Tight-binding potential ~1800 secs per barrier.

– first-principles potential even greater.

How does gain scale with complexity?

– For present problem, the number of barriers required decreases with complexity of configuration space. PROMISING!

Page 26: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Multi-Objective GA Optimization of

Semi-Empirical Quantum Chemistry Potentials

with ab initio accuracy.

BENZENE

Page 27: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Benzene Reparameterization

Target data: Ab Initio values via CASSCF(6/6) (Toniolo et al 2004) ground (S0) and excited-state energies (S1, optical dark, and S2, allowed)

(planar benzene, Dewar Benzene, benzvalene, prefluvene). excited-state gradients at these points.(Franck-Condon region)

Semi-empirical potential: MNDO-PM3 (Stewart,1989) 11 Carbon parameters to optimize (fix hydrogen or core-core repulsion)

Uss,Upp (Coulombic); Gss, Gsp, Gpp, Gpp` (repulsion); Hsp (exchange); βs, βp (resonance); ζs, ζp (Slater orbital exponents) -Modified Neglect of Differential Overlap

Objectives: #1: Error in excited-state energies and geometries

#2: Error in excited-state gradients

Page 28: Accurate Quantum Chemistry via Machine-Learning and

Dewar(b)

1.335[1.348](1.347)

1.481[1.523](1.530)

116.1º[116.8º](116.5º)

1.606[1.601](1.587)

Prefulvene(c)

1.371[1.401](1.404)

1.458[1.496](1.501)

1.533[1.524](1.546)

177.7º[136.0º](136.3º)

S1-S0 CI(d)

1.422[1.466](1.462)

1.367[1.395](1.397)

170.2º[131.4º][129.7º]

1.857[1.942](1.945)

(1.387)

(1.476)

Benzvalene(a)

1.340[1.335](1.350)

1.469[1.509](1.512)

1.430[1.481](1.457)

1.529[1.498](1.526)

1.407[1.475](1.441)

1.359[1.372](1.408)

S2-S1 CI(e)

2.395[2.454](2.456)

1.384[1.461](1.458)

Figure 1. Ground state optimized geometries and important minimal energy conical intersections for benzene. Bond lengths (Å) and angles (degrees) from RPM3, CASSCF(6/6)/6-31* (in parentheses), and CASSCF(6/6)*PT2/6-31* optimizations (in brackets) are shown.

Page 29: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

•Semi-empirical potential: MNDO-PM3 (Stewart,1989)

–11 Carbon parameters to optimize

•Objectives: #1: Error in excited-state energies and geometries

#2: Error in excited-state gradients

• Not obvious how to weight accuracy in energy compared to accuracy in gradients.

• Multi-objective GA with bias solves the problem!

• Use non-dominated sorting GA II (NSGA-II) (Deb et al., 1999)

–Competence and efficiency can be further enhanced by data mining important problem substructures (building blocks)

Benzene Reparameterization

Page 30: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Reparameterization of Semi-Empirical Potentials: Multiobjective Optimization Approach

*O OO

Simultaneously obtain set of non-dominated (Pareto optimal) solutions in parallel.

Avoid potentially irrelevant and unphysical pathways, arising from SE-forms.

Reparameterization of SE-forms involves multiple objectives fit of limited set of ab initio energies, geometries, and energy gradients.

Previous Approach: Sequential weighted local optimization Yields unphysical potentials, Results in local optima, Depends entirely on

the weights on different objectives

Multiobjective optimization

Page 31: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Genetic Algorithm Multiobjective Optimization

Unlike single-objective problems, multi-objective problems involve a set of Pareto-optimal solutions.

Notion of Non-Dominating Solutions

• A dominates C.• A and B are non-dominant.

Solution X dominates Y if: X is no worse than Y in all objectives X is strictly better than Y in at least

one objective

Page 32: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Why use Multiobjective Genetic Algorithms?

Robust search algorithms that yield good quality solutions quickly, reliably, and accurately Rapidly converge to the Pareto optimal front Maintain as diverse a distribution of solutions as

possible Population approach suits well to find multiple solutions Niche-preservation methods can be exploited to find

diverse solutions Implicit parallelism helps provide a parallel search

Multiple applications of classical methods do not constitute a parallel search

Page 33: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Multiobjective GA Results: Unbiased vs. Biased

Bias = Weight error in energy 2x more than in energy gradient.

(Un)Biased solutions are consistently better than the published result.

Tonilo et al (2004)

Pareto-optimal solutions are physical!

37% lower gradient error

33% lower energy error

Biasing:

- convergence 2-3 times faster

- improves solution quality

- finds physical solutions.

Page 34: Accurate Quantum Chemistry via Machine-Learning and

GA Biased

CASPT2

S1/S2 Conical Intersection

Reaction Coordinate

0.0 0.2 0.4 0.6 0.8 1.0

Relative Energy(ev)

0

2

4

6

8

10 FC → S2/S1 Intersection

Dashed lines = target CASPT2 results (only values and gradients at x=0 included in GA fitting)

Minimal Energy Intersections – Expected to play a prominent role in excited state chemistry (nonadiabatic transitions).

Red and Blue are g/h vectors – displacements which lift electronic degeneracy.

Page 35: Accurate Quantum Chemistry via Machine-Learning and

S1/S2 Branching Plane

CASPT2 GA Biased

Page 36: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Potentially creates “transferable” potentials

Benzene parameters compares to MO-GA for Ethelyne C2H4

Page 37: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Mathematical Analysis of GP Population size

Very important parameter for GP performance Currently no guidance to choose population size

Building-Block Supply Analysis What population-size should be used?

Page 38: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

For Ethelyne ~800 solutions needed for reliable

Page 39: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Summary Symbolic regression via GP holds promise for application

to numerous areas of science and engineering. GP mathematical analysis required to determine adequate population sizes, etc.

Case: Surface-vacancy assisted migration in CuxCo1-x Dramatic scaling in time over Table-Look-Up KMC Requires small subset of PE surface information.

Case: Constitutive Behavior of Aluminum AA7055 AA 7055 found strain-rate dependence without ‘a priori’ knowledge.

Case: Reparamaterization of Semi-Empirical Potentials Multiobjective GA yields accurate potentials. Can GP’s help with better forms of potential? This is POTENTIALLY the most exciting applications area.

Page 40: Accurate Quantum Chemistry via Machine-Learning and

7-8 November 2005Website: http://www.mcc.uiuc.edu©Board of Trustees University of Ilinois

Materials Computation CenterUniversity of Illinois Urbana-Champaign

Funded by NSF DMR 03-25939

Future Work

Algorithm Development: Competent operators to handle complex interactions

Mathematical analysis of GP: Population-sizing and convergence-time models

Engineering & Scientific Application: More complex systems: Adatoms, line and planar defects Application in excitation chemistry (Forms of potentials?)

Algorithm Efficiency Enhancement: Parallelization of GP Hybridize GP with cluster-expansion methods

Reduce the configurations that need PE calculation