a new, automated retrosynthetic search engine: archem

38
http:// www.simbiosys.ca ARChem Route Designer Antony Williams On behalf of A. Peter Johnson University of Leeds / SimBioSys Inc. Howard Y. Ando (Pfizer Inc.), Zsolt Zsoldos, Aniko Simon, James Law, Darryl Reid, Yang Liu, Sing Yoong Khew, Sarah Major

Upload: antony-williams-chemconnector-orcid-0000-0002-2668-4821

Post on 11-Jun-2015

8.140 views

Category:

Business


3 download

DESCRIPTION

ARChem Route Designer is a new retrosynthetic analysis package (1) that generates complete synthetic routes for target molecules from readily available starting materials. Rule generation from reaction databases is fully automated to insure that the system can keep abreast with the latest reaction literature and available starting materials. After these rules are used to carry out an exhaustive retrosynthetic analysis of the target molecules, special heuristics are used to mitigate a combinatorial explosion. Proposed routes are then prioritized by a merit ranking algorithm to present to the viewer the most diverse solution profile. Users then have the option to view the details of the overall reaction tree and to scroll through the reaction details. The program runs on a server with a web-based user interface. An overview of some of the challenges, solutions, and examples that illustrate ARChem's utility will be discussed.

TRANSCRIPT

Page 1: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

ARChemRoute Designer

Antony Williams On behalf of A. Peter Johnson

University of Leeds / SimBioSys Inc.

Howard Y. Ando (Pfizer Inc.), Zsolt Zsoldos, Aniko Simon, James Law, Darryl Reid, Yang Liu,

Sing Yoong Khew, Sarah Major

Page 2: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

● Computer Aided Organic Synthesis Design (CAOSD) assists chemists in finding synthetic routes to target compounds

● The first CAOS system (LHASA) was introduced by EJ Corey nearly 40 years ago

● Since then…

Page 3: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

CAOSD systems● LHASA (Logic and Heuristics Applied to Synthetic Analysis)(E.J.Corey et.al.) ● SynChem (Gelenter et.al) ● IGOR (Ugi et.al) ● EROS and WODCA (Gasteiger et.al)● SynGen (Hendrickson et.al.)● Arthur (commercial: Synthematix)● Others…

Page 4: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Typical Retrosynthetic Analysis

TARGET

Identified as an available starting material

In the absence of frequent user intervention the combinatorial nature of this approach prohibits very broad searches (many alternative reactions) as well as very deep searches (many steps between starting materials and products)

Level 1 precursors

Level 2 precursors

Level 3 precursors

Retrosynthetic analysis works backward from the target and generates increasingly simple precursors

Page 5: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Retrosynthetic Analysis versus Reaction Databases

● Reaction databases are popular aids in reaction planning

● Databases are large, highly curated and good tools exist for searching and data-mining

● Compare with “general prediction technology”

– LogP

– NMR spectra etc…

Page 6: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

CAOSD Systems● There is little routine use of retrosynthetic

analysis tools. Why?● Chemists ARE the knowledge base?● Reaction database data mining suffices?● Who trusts computers anyway?

● While reaction databases are very valuable couple “predictions” and reference data

Page 7: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Knowledge base creation

Retrosynthetic analysis is driven by rules describing scope, limitations and structure changes associated with a reaction

Rules have to be manually encoded

Only experienced synthetic chemists have the knowledge to create good rules

Page 8: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Goals of Route Designer • perform rule based retrosynthetic analyses of target

molecules back to readily available materials

• provide fully automated generation of retrosynthetic reaction rules by analysis of a reaction database – avoid time consuming manual creation

• provide the user with literature examples of the transformations suggested by the retrosynthetic analysis

• provide a set of alternative routes to a given target

Page 9: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

User input:Molecule

Starting Materials:Aldrich, Acros

Lancaster

Automatic Rule Extraction

Reaction DB(MOS, Beilstein, CASREACT etc.)

Reaction Rules

Route Des. search

System Design Overview

Output: Reaction pathways

Page 10: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Starting Materials● Automatic selection of starting materials from commercially

available compounds is important for retrosynthetic analysis

● With known starting materials as a basis analysis is directed at

portions of the target molecule that cannot be made from

available starting materials

Lancaster AcrosAldrich Others

Page 11: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Previous works:

Rule extraction from reaction databases1) Satoh et al: A Novel Approach to Retrosynthetic Analysis Using Knowledge

Bases Derived from Reaction Databases, J. Chem. Inf. Comput. Sci. 1999, 39,

316-325

2) Gelernter et al: Building and Refining a Knowledge Base for Synthetic Organic

Chemistry via the Methodology of Inductive and Deductive Machine Learning

J. Chem. Inf Comput. Sci. 1990, 30, 492-504

3) Wang et al: Construction of a generic reaction knowledge base by reaction data

mining, Journal of Molecular Graphics and Modelling 19, 427–433, 2001

Page 12: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Reaction DB( RDF format)

Reaction Rules

1) The extraction process converts many reactions into a few rules.2) The combinatorial explosion of the retrosynthetic search process is controlled.

Generic leaving groups reduce the number of rules

Route Designer Reaction Rules

Cluster reactions by chemical equivalence

Group identical reaction cores

Find the core of the reaction

Methods of Organic Synthesis (MOS) ~42k ReactionsLarge reaction DBs OK (millions of reactions supported)

4k rules from 47k reactions

Page 13: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Identifying Reaction Cores

● The Core - atoms that undergo “changes” during a reaction

● Atom mappings identify atoms attached to bonds changed,

made or broken in the reaction

Extracted Core:

Page 14: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Extension to “non-reacting” atoms● Initial core is extended to include structural

features essential for the reaction (difficult process)

● Empirical rules attempt to capture these features

Page 15: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Reaction Core Extension

Reaction

Extracted Core

Extended Core

Page 16: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Generic leaving groups

Generic ruleReaction Rule

Nucleofuge (NF)- a leaving group which carries away the bonding electron pair

Page 17: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Clustering Cores● Established approaches (Morgan numbers) are used to

identify the reaction core and the entire extended core

● Clustered by exact matching of the extended cores and

different extended cores may be combined

● Rules specifying bond making and breaking operations are

constructed

Page 18: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Reaction DB

in RDF

file format

Esterification examples

clustered: Esterification

Reaction

Rule:

Other examples clustered:

...-> ... Some other rule

Rule Generation Summary

Page 19: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Rule Generation from MOS DB● The Methods in Organic Synthesis database

contains ca. 42k reactions

● Rule extraction performed on this database gave ~3800 rules

Page 20: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Reaction Classification • Automatic classification of MOS gave :

Category Name #

clusters Active ? RD Regular Disconnective 2764 Always

FGI Functional Group Interchange 505 as

needed

FGA Functional Group Addition 261 as

needed

BFGI Bond-oriented FGI 238 as

needed

FGR Functional Group Removal 96 as

needed UPC Unmapped Product Carbon 41 NO DP Deprotection Reaction 27 NO SAP Simple Alkane Product 16 NO BCR Bare Carbonyl Reaction 13 NO UER Unactivated Educt Reaction 7 NO

NR Non-disconnective Rearrangement 3

as needed

Total: 3971

Page 21: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

FGInterchange and FGAdditive

● Non-disconnective transformations such as FGI and FGA must either

– lead to an available starting material

– change functionality so that a disconnective

transform can be triggered

– uncontrolled use of FGI / FGA would give a

combinatorial explosion

Page 22: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Top 5 Clusters (MOS example)

Page 23: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Page 24: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

TARGET

Identified as an available starting material

In the absence of frequent user intervention the combinatorial nature of this approach prohibits very broad searches (many alternative reactions) as well as very deep searches (many steps between starting materials and products)

Level 1 precursors

Level 2 precursors

Level 3 precursors

Page 25: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Exhaustive Search● Control the combinatorial explosion!!!

– Fundamental transformations (esterification, amide formation …) applied at first stage to break up target

– User selectable search constraints● Depth limit for search● Minimum number of examples for a rule to be used● Strategic disconnections / preserved bonds

– FGI/FGA ( eg CH2OH ==> CO2H) restricted to cases triggering subsequent disconnective transforms or finding starting materials

– Unstable functionality generated must be removed at next step (some organometallics or acyl chlorides)

Page 26: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Preserve and Target Bonds● Preserve selected bonds and Target other

bonds during analysis

Page 27: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Ordering of Results● Prioritization of solutions

– Disconnective transforms before FGI/FGA

– Minimize wastage (atom efficient reactions)

– Starting material coverage

– Prefer thoroughly explored chemistry (based on example count)

– The more bonds broken in the retrosynthetic transformation the better

Page 28: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Real World Examples● Tested on hundreds of examples including

drugs, natural products, publications● An example:

– Zatosetron,36 a potent, selective and long acting

5HT receptor antagonist from Lilly used in the

treatment of nausea and emesis associated with

oncolytic drugs

– < 5 mins on standard PC,

Page 29: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Page 30: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Page 31: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Zatosteron● The route shown is virtually identical to the

published route for Zatosetron (Robertson et al. J. Med. Chem., 1992; Vol. 35, pp 310-319)

● Minor differences include:

– using the 3-bromo-2-methylpropene rather than the

chloro version

– the aminotropene was not found in our starting

material database but the tropinone precursor was

Page 32: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

How Fast? How Complex?● Retrosynthetic analyses from minutes to

hours based on complexity and constraints● System is based on construction of skeletal

connections not on stereochemistry● Does not take into account conditions –

temperature, pressure etc● Estimates yields based on database

reactions but they are ESTIMATES!

Page 33: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

How Fast? How Complex?● Very intuitive and fast to initiate request● Can be expanded with other catalogs of

starting materials easily● “Training” via cluster analysis is not difficult

but is not an everyday task either● Can be tested using an online platform● Proven application at a number of companies

already

Page 34: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Under Development● Indicate preferred starting materials to bias the search● Improve clustering to fully capture chemical constraints

including better regioselectivity and stereoselectivity – target is much smaller rule set

● Deal with interfering functional groups● Order search results to reflect chemists’ preferences

● ...

Page 35: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Interfering functionality● Compatible functionality is detected through

comparison with reaction example databases● Possible interfering functionality is inferred● Search result rank is marginally weighted

against interfering functional groups

● ...

Page 36: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

Rule set optimizations● Promotion of heteroaromatic rules using lower example threshold

than other rules

● 300k rules generated from the Beilstein database gave >50k rules with heteroaromatic relevance

● Initial results show dramatic performance improvements

● Future extension to other rule categories is under investigation

● ...

Page 37: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

● “Predictions” and reference data are proven approaches – extend to retrosynthetic analysis

● Route Designer already provides good routes for a variety of targets AND they are “predictions”

● Any updates in reaction databases can be easily incorporated into the rule base – “user training”

● Starting materials database can be enhanced and extended easily – there are 10s of catalog vendors which can be added

Route Designer summary

Page 38: A new, automated retrosynthetic search engine: ARChem

http://www.simbiosys.ca

AcknowledgementsAcknowledgements

Pfizer

Robert Wade

Robert Dugger

James Gage

Peter Wuts

David Entwistle

Inaki Morano

Richard Nugent

Bryan Li

Julian Smith

Accelrys

Rob Brown

Eric Jamois

Symyx-MDL

Maurizio Bronzetti

Jochen Tannemann