![Page 1: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/1.jpg)
Prediction of NMR Chemical Shifts.
A Chemometrical Approach
К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg
Advanced Chemistry Development (ACD)
![Page 2: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/2.jpg)
Structure and its spectral data
COSY.esp
4 3 2 1F2 Chemical Shift (ppm)
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
F1
Che
mic
al S
hift
(ppm
)
HMQC.esp
4 3 2F2 Chemical Shift (ppm)
16
24
32
40
48
56
64
72
80
F1
Che
mic
al S
hift
(ppm
)
C13.esp
85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10Chemical Shift (ppm)
0.25
0.50
0.75
1.00
Nor
mal
ized
Inte
nsity
26.8
531.6
1
42.4
642
.86
48.2
2
50.3
251
.94
52.6
7
60.1
060
.1864
.59
76.7
877
.03
77.2
977
.60
H1.esp
4.0 3.5 3.0 2.5 2.0 1.5Chemical Shift (ppm)
0.25
0.50
0.75
1.00
Nor
mal
ized
Inte
nsity
CH4
StructureSpectraN
NO
O N
NO
O
O
N
N
O
O
N
N
O
![Page 3: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/3.jpg)
Sometimes solution is not obvious
• In many cases we obtain several structures corresponding to spectral data.
• In this case we need a method to rank the structures.
• Most powerful method - compare experimental and predicted 13C NMR spectra
![Page 4: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/4.jpg)
13C NMR spectral data
NN
O
O
N
N
O
O
2,00
9.62
Experimental
Predicted
![Page 5: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/5.jpg)
How to find the best structure?
• In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum
• In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm
![Page 6: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/6.jpg)
The role of the spectra prediction
• Real-world task. Unknown structure with MF C29H32N2O5 and spectral data (1D and 2D NMR).
• 20 min to generate all structures (> 12 000) • 24 hours to predict the NMR 13С spectra
of all the obtained structures• Speed of spectra prediction should be
increased
![Page 7: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/7.jpg)
Methods of the prediction of NMR spectra
• Quantum Mechanics• Database approach
– HOSE Codes– Maximum Common
Substructure
• Rule-based – Additive scheme– Neural Networks
– extremely slow– accurate but slow
– fast but inaccurate
• Our choice – improve accuracy of fast method
![Page 8: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/8.jpg)
Additive scheme
aixi
=
C
O
CH3
C
CH2
CH
CH2
CH2
CH2
153.71-1.85-4.49-1.39-2.79+1.43+0.52+0.52-1.35 = 144.31
153.71
-1.85
-4.49
-1.39
-2.79
1.43
0.52
0.52
-1.35
144.31
Main problem – find correct values of atom increments
![Page 9: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/9.jpg)
Available data
• We have database of 1.5 millions of chemical shifts for 13С.
• We can try to obtain correct values!
![Page 10: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/10.jpg)
How to encode atom environment
CH2Atom’s type
Number of atoms…1 1
CH
Input variables
C
O
CH3
C
CH2
CH
CH2
CH2
CH2
…C
1
1st sphere
CH2 CH3O
2 1 1
2nd sphere
![Page 11: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/11.jpg)
Data for PLS regressionAtom environment encoding
Sam
ples
Chemical shifts
X Y
![Page 12: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/12.jpg)
Find best structure encoding
• Initially best scheme of structure representation does not evident
• We should find scheme which has best accuracy
• We should optimize– substitutents coding scheme – number of used “spheres”
![Page 13: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/13.jpg)
Used data
• 210 K of chemical shifts used as a training set.
• 170 K of chemical shifts from recent literature used as external validation set.
![Page 14: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/14.jpg)
How to describe atom type
• Atom type (C, O, etc.).
• Hybridization (sp3, sp2, etc).
• Valence
• Number of neighbor H.
• Charge
• Distance to “central” atom (bonds)
CH3
CH
CH
NH2
“Central” atom
“Substitutent”
7 (N)
1 (sp3)
32
0
3
![Page 15: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/15.jpg)
Result for different atom encoding
7.17
10.96
5.36
8.76
4.39
6.57
3.52
5.37
0.00
2.00
4.00
6.00
8.00
10.00
12.00
Atoms only + Elementtype
+Hybridization
+ All other
AverageDeviation
StandardDeviation
![Page 16: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/16.jpg)
Result for number of spheres
5.43
7.69
3.97
5.88
3.66
5.51
3.52
5.37
3.51
5.37
3.53
5.40
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
1 2 3 4 5 6
Number of "Spheres"
Averagedeviation
Standarddeviation
![Page 17: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/17.jpg)
Is it the best possible accuracy?
• Best possible average deviation is 3.5 ppm.
• We need less than 3 ppm (2 is preferable).
• Should we use additional variables?
• We should be very careful adding variables.
![Page 18: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/18.jpg)
CH2 C
CH3
CH3
141,48125,90CH2 C
Cl
Cl
CH2 C
Cl
CH3
138,30
125,38CH2 C
H
Cl
Substitutents interference (cross effect)
CH2 C
H
H +2,48
122,90 CH2 C
H
CH3
134,16
+1.34 -1.94 -3.94
145.42127.86136.64
+11,26
![Page 19: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/19.jpg)
C
O
CH3
C
CH2
CH
CH2
CH2
CH2
Enhanced structure encoding
CH2 and CH Atom pair type
Number of pairs…1
Input variables
…
1
Atoms Pairs of atoms (Crosses)
C and O
![Page 20: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/20.jpg)
1 2 3 4
43
21
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
Result for atom pairs (crosses)
Distance between atoms
within a crossNumber of spheres
Mea
n er
ror,
ppm
![Page 21: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/21.jpg)
More enhancements?
• Now accuracy is good enough (2.3 ppm)
• But it is still bad in some cases
• Unfortunately these cases are very important
• This “special” cases should be taken into account
![Page 22: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/22.jpg)
Stereo effects: double bonds
CH3
OOH
CH3
CH3
CH3
25.7
17.6
3,9 A
2,9 A
• We use “topological” distance
• Sometimes equal topological distance correspond to different “real” distances
![Page 23: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/23.jpg)
Modified structure encoding
Atoms Pairs of atoms (Crosses) “Stereo” effects
Variables
![Page 24: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/24.jpg)
Prediction of spectra by different methods (mean error, ppm)
Taken into the account All types of atoms
CH3 =C
Atoms only 3,52 1,55 8,03
+ pairs of atoms (crosses)
2,32 1,50 3,22
+ “stereo” effects 2,27 1,24 3,22
+ solvent 2,25 1,24 3,20
+ to be continued?
![Page 25: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/25.jpg)
Size of training set
• We have 1.5 millions of chemical shifts
• We should try to use all available data
• Only one problem – matrix size
• In many cases matrix size becomes more than 2 GB
![Page 26: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/26.jpg)
Bigger dataset – smaller mean error!
0.00
1.00
2.00
3.00
4.00
5.00
1 2 4 8 16 32 64 128 207
Number of structures in training set (thousands)
Av
era
ge
de
via
tio
n (
pp
m)
![Page 27: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/27.jpg)
The final results
Method Average deviation
The rate of calculationshifts/sec.
Old Method - HOSE Codes
1.87 6
New Additive scheme
1.83 5800
Faster by 3 order!
![Page 28: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/28.jpg)
Prediction time: the past and present
NH
NH
O
O
CH3
CH3
OO
O
CH3
Method Average deviation Time
HOSE Codes 1.72 > 24 hours
Additive scheme 1.63 2 min.
C29H32N2O5
![Page 29: Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649f575503460f94c7bee6/html5/thumbnails/29.jpg)
Conclusions
• Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result