reaction simulation expert systems for synthetic organic chemistry

47
Reaction simulation expert systems for synthetic organic chemistry Jonathan H. Chen and Pierre Baldi University of California, Irvine School of Information and Computer Sciences Institute for Genomics and Bioinformatics School of Medicine http://cdb.ics.uci.edu

Upload: others

Post on 11-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Reaction simulation expert systems for synthetic organic chemistry

Jonathan H. Chen and Pierre Baldi

University of California, Irvine School of Information and Computer Sciences

Institute for Genomics and Bioinformatics School of Medicine

http://cdb.ics.uci.edu

Reaction Prediction

  Given a mixture of reactants and reaction conditions, predict the major products

+ ? NaOMe

Δ

  Fundamental problem-solving skill of expert human chemists

  Critical for applications such as retro-synthesis design and reaction discovery

Need for Reproducible Expertise

buproprion atorvastatin

Automated suggestion of synthetic reactions by pattern matching is straightforward, but “expertise” is knowing which suggestions are actually feasible and reasonable

DCC KMnO4

albuterol

fenofibrate

Pd(0)

CO (gas)

Mg

Transformation Rules

π-bond protic acid addition

carbocationhalide

addition

  Chemical state machine modeling at mechanistic level of detail

  State information: Molecular structure   State transition: Transformation rules

SMIRKS Description

[C:1]=[C:2].[H:3][Cl,Br,I:4]>>[+0:3][C:1][C+:2].[Cl,Br,I;-:4] Alkene, Protic Acid Addition

[C+:1].[Cl,Br,I;-:2]>>[C+0:1][+0:2] Carbocation, Halide Addition

Reaction Explorer [DEMO]

  Product prediction for different reactions but using a common reagent   Sn2 CC1(Oc2cc(c3cc[nH]c3c2O1)CBr)C   Nucleophilic Acylation c1c2c(c(cn1)Cl)CCOC2=O   Robinson Annulation C[C@H]1c2c(cccn2)CCC1=O

  Mechanistic detail explanation of how or why products created

  Use as synthesis workspace   Tylenol #94

Results and Progress Expert system with over •  80 reagent models •  1,500 reaction rules •  4,500 validation examples

Subject Categories Implemented •  Substitution and Elimination of Alkyl Halides •  Alcohols and Epoxides •  Alkenes, Electrophilic Addition •  Alkynes, Addition and Acetylide Ions •  Alkanes, Radical Reactions •  Dienes, Conjugation, Diels-Alder •  Electrophilic Aromatic Substitution •  Reactions of Substituted Benzenes •  Oxidation-Reduction Reactions •  Aldehydes and Ketones •  Carboxylic Acid Derivatives •  Enolate Chemistry •  Aldol Chemistry •  Amines and Arenediazonium Reactions •  Transition Metal (Palladium) Catalysis •  SnAr and Benzyne Reactions •  Naphthalene and Heteroaromatic Reactions •  Pericyclic Reactions •  Carbohydrates •  Amino Acid and Peptide Synthesis

J. Chem. Educ. 2008, 85, 1699

Principle-driven Simulations Principle-Driven Simulations   Not based on transformation rules   Driven by principles of physical chemistry Key Components   Core Reaction Unit Model   Scoring Function for Reactions   Chemical Kinetics Simulation

nN π*C-O nO σ*C-Cl

σ* π* p n π σ

Reaction Coordinate

Relative E

nergy

ΔG

ΔG‡

Core Reaction Unit Model   Bond-rearrangement patterns are most typical choice.   These only represent the overall “symptom” of the

reaction and not the underlying mechanistic steps.   Many such patterns must be “memorized” to get

decent coverage.

Sn2 Substitution [CX4H2:1][Br:2]>>[C:1]O

Acyl Substitution (Saponification) [O:2]=[C:1][OH0:3]>>[O:2]=[C:1][O-].[O-:3]

Robinson Annulation [*:3][C:2]1[C:11][C:10][C:9][C:8][C:1]1=[O:20].[C:5][C:4](=[O:12])[C:6]=[C:7]>> [*:3][C:2]12[C:11][C:10][C:9][C:8][C:1]1=[C:5][C:4](=[O:12])[C:6][C:7]2

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Core Reaction Unit Model More Favorable Less Favorable

nCl > pC nI > pC

πC=C > σ*H-Br πC=C > σ*H-O

σH-B > π*C=O (ketone) σH-B > π*C=O (amide)

σ*

π*

p

n

π

σ

Molecular Orbital Interactions as Elementary Reaction Steps

Scoring Function for Reactions   Purpose

  Identify favorable reaction steps   Ideally predicts transition state

activation energies (ΔG‡)

  Statistical Machine Learning   Limited quantitative data available   Inspiration from the problem-

solving abilities of human experts   Use qualitative knowledge of

reactivity trends as a major training data source

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Reaction Coordinate

Relative E

nergy ΔG

ΔG‡

C. A. Azencott, M. A. Kayala, P. Baldi, “Learning Scoring Functions for Chemical Expert Systems”

Law of Mass Action Simulation   Results depend on reactivity

scores and concentrations   Reversible reactions driven

by Le Chatelier’s principle

  Catalytic quantities of highly reactive species

  Discrete simulation approximation to bootstrap off incomplete information

Eyring-Evans-Polanyi Equation

  Principled conversion of ΔG‡ to reaction rate constant k with temperature dependence

  Theory only applies for elementary reaction steps

Chemical Kinetics Simulation

Reaction Simulator [DEMO]

  Simple enolate deprotonation   No other input but starting materials, self-

perception of reactive sites and combinations   Kinetic vs. thermodynamic simulator controls

  Complex example evolving over time   Trace full reaction pathway to justify the

prediction, including energy diagram

Summary Comparison

Transformation Rules General Principles

Immediately useful results

Development and optimization ongoing

Predictions within seconds

Longer simulation times (minutes)

Only covers what has been programmed into it

Greater potential for generality and discovery

Only provides information on major

product(s)

Kinetics simulations provide information on

major and minor pathways

Acknowledgements

  Prof. Pierre Baldi (ICS)   Prof. Elizabeth Jarvo (Chem)   Dr. Susan King (Chem)   Prof. Greg Weiss (Chem)   Prof. David Van Vranken (Chem)   Prof. James Nowick (Chem)

  NIH/NLM Biomedical Informatics Training Grant   UCI Medical Scientist Training Program   Orange County ARCS® Foundation

Students   Chloe Azencott   Matt Kayala   Paul Rigor   UCI Students

http://cdb.ics.uci.edu

Academic Software   OpenEye Software   ChemAxon Software   Peter Ertl, Novartis

(JME Editor)

Course Instructors   Prof. Suzanne Blum   Prof. Zhibin Guan   Prof. Larry Overman   Prof. Ken Shea   Dr. Mare Taagepera   Prof. Chris Vanderwal

Extend Orbital Chaining

  Interactions between a small set of fundamental orbital types dominate organic reactivity

  Higher order interactions can be composed by chaining fundamental units together

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

nO > πC=C > π*C=O nN > σ*H-C > σ*C-Br

Need for Reactivity Prediction

  Retro-synthetic analysis usually only suggests precursors, but does not account for unintended reactivity

  Existing systems may use exclusion rules   Best to reproduce forward sequence of

suggested reactions to ensure reliability

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Taxol Anti-cancer

Yew Tree Sap

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Morphine Pain Medication Opium Poppies

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Penicillin G Antibiotic Fungus

Motivation

  Total Synthesis of important drugs and chemicals

Andrimid Anti-Tuberculosis Lead Compound

  Chemical Modification to optimize lead compounds

  Goal / Hypothesis: Can a computer expert system reproduce the core problem-solving capabilities needed of human chemists?

SMILES Extensions

  Atom Mapping   Necessary to map reactant to product atoms   Proper transform requires balanced stoichiometry

  Hydrogens generally must be explicitly specified

Carboxylic acid + [O:1]=[C:2]([*:9])[O:3][H:7]. Primary amine [H:8][N:4]([*:10])[H:5]>> Amide + [O:1]=[C:2]([*:9])[N:4]([*:10])[H:5]. Water [H:7][O:3][H:8]

R1

O

OH

NH-R2 H +

R1

O + H2O

NH-R2

1

2

9 3 7

8 4 5 10

1

2

7,8 3

9 4 5 10

Molecular Orbital List

Filled •  sp2 O •  π CO •  … Unfilled •  π* CO •  σ* HC π* CO •  …

Filled •  sp3 O •  σ CO •  … Unfilled •  σ* HO •  σ* CO •  …

Filled •  sp3 O •  sp2 O •  … Unfilled •  π* SO •  σ* HO π* SO •  …

Outline   Motivation

  Need for reactivity prediction   Rules-based Predictor Capabilities

  Predictive general reagents (NaOH), with mechanism explanations   Synthesis workspace (tylenol #94)

  Principle-based Functional Demo Intent   Complex example evolving over time   Kinetic vs. thermodynamic example to illustrate simulation controls

  Fundamental Reaction Unit Model   Chaining: Retain simple set of fundamental orbitals, then just compose for

higher order   Scoring Interactions

  Qualitative Knowledge vs. Quantitative Data   Simulations

  Chemical Kinetics   Discrete model for bootstrapping from incomplete starting information

  Rules vs. Principles   Ongoing Work

  Parameter development for more reactivity classes