recent improvements in marvin v6 reaction atom mapping and its application to reaction validation in...
Post on 19-Oct-2014
502 views
DESCRIPTION
Automatic atom mapping attempts to determine the correspondence between the atoms of the reactants and products of a chemical reaction. Such mappings are useful for allowing greater specificity in queries of reaction databases. Recently there has been increased interest in their use to assist in the validation and standardisation of reactions in pharmaceutical ELNs (electronic lab notebooks). Atom mappings can, for example, detect if a reactant is missing or if a reactant does not contribute atoms to the product and hence may be better stored as an agent. We have evaluated the performance of the new atom mapping algorithm introduced with Marvin v6 compared to the prior version on a publically available dataset extracted from the patent literature and on reactions from multiple pharmaceutical ELNs. Dramatic improvements are observed in all cases both in the percentage of reactions that can be successfully atom-mapped and the quality of mappings produced. Finally we examine the difficulties that remain in validating reactions for which a complete atom mapping is not possible, such as for “routine” reactions where the reactant that was added is missing.TRANSCRIPT
ChemAxon UGM, San Diego, USA 25th September 2013
Recent improvements in Marvin v6: Reaction Atom Mapping and its Application to
Reaction Validation in Pharmaceutical ELNs
Daniel Lowe and Roger Sayle
NextMove Software
Cambridge, UK
ChemAxon UGM, San Diego, USA 25th September 2013
What is Atom-Mapping?
Mapping algorithm
ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• Assigning roles to reagents
• Normalization of reactions for registration
ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• More precise database searches
– Solvents/catalysts can be distinguished from reactants
– Allows the relationship between the reactant atoms and product atoms to be made explicit
ChemAxon UGM, San Diego, USA 25th September 2013
Example
• I want to find reactions converting an alkene to a cyclopropane so I search for C=C>>C1CC1
ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• Identifying suspect reactions:
ChemAxon UGM, San Diego, USA 25th September 2013
Chemaxon atom mapping
ChemAxon UGM, San Diego, USA 25th September 2013
Chemaxon atom mapping
ChemAxon UGM, San Diego, USA 25th September 2013
Atom mapping modes
• Complete
• Changing
• Matching
ChemAxon UGM, San Diego, USA 25th September 2013
Methodology
Test set Reactions
Pharmaceutical ELN subset 18,244
ChemReact68 database 67,926
SPRESI database subset 5,230
Reactions extracted from 2008-2011 USPTO patent applications*
562,872
* Lowe, D. M. Automated Extraction of Reactions from the Patent Literature. 243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.
ChemAxon UGM, San Diego, USA 25th September 2013
MetricS used
• Were all product atoms mapped
– Measures recall
• How many C-C bonds were broken
– Measures precision
ChemAxon UGM, San Diego, USA 25th September 2013
Ability to map all product atoms
0
10
20
30
40
50
60
70
80
PharmaELN ChemReact68 SPRESI USPTO
Pe
rce
nt
of
reac
tio
ns
wit
h a
ll p
rod
uct
ato
ms
map
pe
d
Marvin 5.10
Marvin 6.0
ChemDraw 12
ChemAxon UGM, San Diego, USA 25th September 2013
c-c bonds broken
0.0
0.2
0.4
0.6
0.8
1.0
1.2
PharmaELN ChemReact68 SPRESI USPTO
Ave
rage
nu
mb
er
of
C-C
bo
nd
s b
roke
n p
er
map
pin
g (l
ow
er
is b
ette
r)
Marvin 5.10
Marvin 6.0
ChemDraw 12
ChemAxon UGM, San Diego, USA 25th September 2013
Marvin 5.10
ChemDraw 12
Marvin 6.0
ChemAxon UGM, San Diego, USA 25th September 2013
Speed Comparison
*Comparison performed on the PharmaELN dataset on an i7-2600
0
50
100
150
200
250
300
350
Marvin 5.12 Marvin 6.0 Marvin 6.0(multithreaded)
Re
acti
on
s m
app
ed
pe
r se
con
d
ChemAxon UGM, San Diego, USA 25th September 2013
Difficult cases
ΔT
ChemAxon UGM, San Diego, USA 25th September 2013
Areas for improvements: Implicit stoichiometry
ChemAxon UGM, San Diego, USA 25th September 2013
Areas for improvements: many choices for reactant atom mapping
ChemAxon UGM, San Diego, USA 25th September 2013
0
10
20
30
40
50
60
70
80
90
100
PharmaELN
Pe
rce
nt
of
reac
tio
ns
wit
h a
ll p
rod
uct
ato
ms
map
pe
d Marvin 6.0
ChemDraw 12
Marvin6 + ChemDraw12
Consensus Result*
Consensus Methods
* Marvin 6.0 + ChemDraw12 + 2 variants of GGA’s Indigo toolkit + InfoChem ICMap + Pipeline Pilot + MDL Cheshire
ChemAxon UGM, San Diego, USA 25th September 2013
Beyond atom mapping
• Missing reactants (often for routine reactions)
ChemAxon UGM, San Diego, USA 25th September 2013
Beyond atom mapping
• Change of stereoisomer or chiral resolution
(E)-3-{8-[2-(4-Isopropyl-1,3-thiazol-2-yl)ethyl]-2-methoxy-4-oxo-4H-pyrido[1,2-a]pyrimidin-3-yl}-2-propenoic acid (1 mg) was dissolved in CDCl3 (0.5 ml) and irradiated with light from a fluorescent lamp
for 19 hours . The solvent was evaporated to obtain the title compound (1 mg).
ChemAxon UGM, San Diego, USA 25th September 2013
Atom mapping + classification
0
10
20
30
40
50
60
70
80
90
100
Atom mappingalgorithms alone
Combined withNameRXN
Pe
rce
nt
of
reac
tio
ns
wit
h a
ll p
rod
uct
at
om
s m
app
ed
Marvin 6.0
ChemDraw 12
ConsensusResult
Verified / Recognised
by NameRXN
(71%)
ChemAxon UGM, San Diego, USA 25th September 2013
conclusions
• Marvin v6’s atom mapping algorithm provides large improvements in recall, precision and speed over v5
• Atom mapping in some cases isn’t as simple as finding a maximum common subgraph mapping
• Classification algorithms can be useful for the validation of some reactions
ChemAxon UGM, San Diego, USA 25th September 2013
acknowledgements
• Zsolt Mohacsi and Istvan Rabel, ChemAxon
• Ed Griffen and Nick Tomkinson, AstraZeneca
• Andrew Wooster, GSK
• Hans Kraut, InfoChem
• Thank you for your time.