enc07 neutral network algorithms 070420

23
Advanced Chemistry Development, Inc. (ACD/Labs) Advancements in NMR Predictions- Neural Network vs. HOSE Code Algorithms Brent Lefebvre NMR Product Manager ACD/Labs’ ENC User’s Meeting April 21, 2007

Upload: antony-williams-chemconnector-orcid-0000-0002-2668-4821

Post on 24-Jun-2015

440 views

Category:

Technology


0 download

DESCRIPTION

In silico prediction of small molecules properties is widely used in todays industry and academia. Particularly, NMR spectra are predicted by a variety of software packages. Two main approaches are used:Database-based. Compounds are compared against a database, the result is calculated using data for close structural relatives found in the dataset. Regression-based. Experimental database is used to calculate parameters of non-linear regression. Chemical shift is represented as a non-linear function of some variables which describe characteristic features of a molecule of interest.Two outlined approaches require different strategies for further improvement. Database-based results are improved by acquiring larger database and/or including data for user-specific data into calculation.

TRANSCRIPT

Page 1: Enc07 Neutral Network Algorithms 070420

Advanced Chemistry Development, Inc. (ACD/Labs)

Advancements in NMR Predictions-

Neural Network vs. HOSE Code Algorithms

Brent LefebvreNMR Product Manager

ACD/Labs’ ENC User’s MeetingApril 21, 2007

Page 2: Enc07 Neutral Network Algorithms 070420

3

Why Neural Networks?

The Neural Network algorithm offers a very specific advantage

Speed of calculation is hundreds of times fasterThis enables prediction on-the-fly

For Structure Elucidator, a key feature

Page 3: Enc07 Neutral Network Algorithms 070420

4

Why Neural Networks?

Also a fresh approach for ACD/Labs to shift predictionWe are always researching new ways to improve our software

Also see our poster (#150) on our new increments scheme

Page 4: Enc07 Neutral Network Algorithms 070420

5

Realization

The Neural Network algorithm was outperforming our version 9 HOSE code!Steps were then taken to migrate this algorithm out of Structure Elucidator and into the ACD/CNMR Predictor

Page 5: Enc07 Neutral Network Algorithms 070420

6

Implementation

Page 6: Enc07 Neutral Network Algorithms 070420

7

Neural Network Algorithm

Page 7: Enc07 Neutral Network Algorithms 070420

8

Implementation

Training the Neural NetEntire database from version 9 usedAdditional database of 187,000 shifts used for accuracy testing

Page 8: Enc07 Neutral Network Algorithms 070420

9

Neural Network Approach

How does this neural net implementation compare to others in the industry?What is unique about it?Does this make it better or worse?

Page 9: Enc07 Neutral Network Algorithms 070420

10

Neural Network Approach

Our research brought us to some new conclusionsSome implementation details differed from previous industry attempts

Page 10: Enc07 Neutral Network Algorithms 070420

11

Neural Network Approach

We found that:Characteristics of the Neural Net were NOT the most important factorStructure encoding scheme was most importantSize and accuracy of training set is key

Our huge quality checked database gave us a tremendous advantage

Page 11: Enc07 Neutral Network Algorithms 070420

12

Using the Neural Network Predictions

How are they accessed in the software?

Page 12: Enc07 Neutral Network Algorithms 070420

13

Using the Neural Network Predictions

Page 13: Enc07 Neutral Network Algorithms 070420

14

Using the Neural Network Predictions

Page 14: Enc07 Neutral Network Algorithms 070420

15

Limitations of the Neural Network Predictions

Predictions are a black boxNo calculation protocol as for HOSE code

Training of predictions could be possible

Does not outperform HOSE code training

Page 15: Enc07 Neutral Network Algorithms 070420

16

Statistics

How do NN compare to old and new HOSE code?When should I use NN?What is the new performance?

Page 16: Enc07 Neutral Network Algorithms 070420

17

Prediction Accuracy

We calculate our prediction accuracy for HOSE code the same way every year

A “Leave-one-out” analysis of our entire database (2 million chemical shifts)

This allows us to compare year on year improvementA TRUE analysis of how accurate the predictors are

Page 17: Enc07 Neutral Network Algorithms 070420

18

L-O-O Analysis

Database: W:\CNMR.1000\STATISTICS\CNMR105.INTChemical Shifts : 10.5 (1982234 pts)

280260240220200180160140120100806040200-20-40-60Chemical Shifts : Value (ppm)

-40

-20

0

20

40

60

80

100

120

140

160

180

200

220

240

260

280

Database: D :\TEM P\FROM 48\C NM R .800\CNM R 8_ALL.INTChem ical Shifts : 8.0 (1861611 pts)

280260240220200180160140120100806040200-20-40C hem ic a l S h ifts : Va lue (ppm )

-40

-20

0

20

40

60

80

100

120

140

160

180

200

220

240

260

280

Ch

em

ica

l S

hif

ts :

8.0

Version 8.00Version 8.00 Version 10.05Version 10.05

Page 18: Enc07 Neutral Network Algorithms 070420

19

Prediction Accuracy

Standard Error of Prediction Formula:

n-1

(expi -

calci)2

n(n-1)

expi -

calci

2

-

Standard Error of Prediction

=

Page 19: Enc07 Neutral Network Algorithms 070420

20

Prediction Accuracy

CNMR Predictor Standard ErrorVersion 8 - 3.11 ppmVersion 9 - 2.32 ppmVersion 10.00 - 2.26 ppmVersion 10.05 – 1.84 ppm

A 21% increase in accuracy over version 9!A 41% increase in accuracy over version 8!

Page 20: Enc07 Neutral Network Algorithms 070420

21

Prediction Accuracy

Comparison of HOSE and Neural Network>187,000 chemical shifts used in testNN algorithms- 12% accuracy increase over version 9 HOSE CodeVersion 10 HOSE code- 16% accuracy increase over version 9 HOSE code

HOSE Code is better for now

Page 21: Enc07 Neutral Network Algorithms 070420

22

The Future of Neural Nets

What is planned for NMR Predictors?How do Neural Networks fit into these plans?

Page 22: Enc07 Neutral Network Algorithms 070420

23

The Future of Neural Nets

Version 11 will further integrate the Neural Network Algorithm

An intelligent hybrid approachMuch like the use of incremental scheme today

Stay tuned for more validation results1H NMR validation study

Page 23: Enc07 Neutral Network Algorithms 070420

24

Acknowledgements

Kirill BlinovMikhail KvashaMarina Solnetseva and the database teamRyan Sasaki