computational chemistry robots acs sep 2005 computational chemistry robots acs sep 2005...
Post on 22-Dec-2015
231 views
TRANSCRIPT
Computational Chemistry Robots
ACS Sep 2005
Computational Chemistry Robots
ACS Sep 2005
Computational ChemistryRobots
J. A. Townsend, P. Murray-Rust,
S. M. Tyrrell, Y. Zhang
Computational Chemistry Robots
ACS Sep 2005
•Can high-throughput computation provide a reliable “experimental” resource for
molecular properties?
•Can protocols be automated?
•Can we believe the results?
Computational Chemistry Robots
ACS Sep 2005
Aspects of complete automation
• Humans must validate protocols rather than individual data
• Low rates of error must be addressed• Users should know the rates of error and degree
of conformance
Computational Chemistry Robots
ACS Sep 2005
Approaches to conformance
• Explore limits of job behaviour (times, convergence, etc.)
• Analyse reproducibility• Vary and analyse effects of parameters and
algorithms• Compare output with other “measurements” of
same quantity
Computational Chemistry Robots
ACS Sep 2005
The overall view
molecules computation dissemination
Computational Chemistry Robots
ACS Sep 2005
The overall view
molecules computation dissemination
Check results
Computational Chemistry Robots
ACS Sep 2005
Components of System
• Workflow for management of jobs (Taverna)• Natural Language Processing based parsing of
outputs (JUMBOMarker)• Pairwise comparison of data sets (R)• Analysis of mean and variance• Detection and analysis of outliers
Computational Chemistry Robots
ACS Sep 2005
Computing the NCI database
MOPACPM5a
aMOPAC PM5 – collaboration with J.J.P. Stewart
Computational Chemistry Robots
ACS Sep 2005
Protocol
Log Files
Parse
SystemCrashes
ScienceErrors
Analysis
PathologicalBehaviour
Statistics
Other Science DisseminateResults
UnsuitableData
ProgramCrashes
InformDeveloper
Computational Chemistry Robots
ACS Sep 2005
Taverna
•Workflow programs allow a series of small tasks to be linked together to develop more complex tasks
•Open Source
•myGRID, eScience
•European Bioinformatics Institute
•University of Manchester
Computational Chemistry Robots
ACS Sep 2005
An Example Taverna Workflow
Computational Chemistry Robots
ACS Sep 2005
Parsing Log Files to CMLCoordinates
Molecular
Formula
Calculation Type
Point Group
Dipole
Total Energy
Computational Chemistry Log Files
Computational Chemistry Robots
ACS Sep 2005
CompChemOutput
Coordinates
Energy Levels
Vibrations
Coordinates
Energy Level
Vibration
CML File
CMLCore
CMLCore
CMLComp
CMLSpect
Input/jobControl General
Parsers
Computational Chemistry Robots
ACS Sep 2005
Dissemination of results
LOG FILE CML FILE HUMAN DISPLAY
WWMM* Server and DSpace Outside world
JUMBOMarker
NLP-based log file parser
* World Wide Molecular Matrix
Computational Chemistry Robots
ACS Sep 2005
InChI: IUPAC International Chemical Identifier
A non-proprietary unique identifier for the representation of chemical structures.
A normal, canonicalised and serialised form of a chemical connection table.
InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq/
Computational Chemistry Robots
ACS Sep 2005
Proteus molecules*
Calculation
JUNK Cured by MOPAC
* Proteus was a shape changing ocean deity
Computational Chemistry Robots
ACS Sep 2005
Proteus molecules
Calculation
Input JUNK
Computational Chemistry Robots
ACS Sep 2005
How do we know our results are valid?
ComputationalMethod 1
ComputationalMethod 2
Experiment
Computational Chemistry Robots
ACS Sep 2005
J.J.P. Stewart’s example
Calculated Hf – Expt Hf
Computational Chemistry Robots
ACS Sep 2005
GAMESS
MOPACresults
GAMESSa
631G*B3LYP
Log Files
a Project with Kim Baldridge and Wibke Sudholt
Computational Chemistry Robots
ACS Sep 2005
Protocol
Log Files
Parse
SystemCrashes
ScienceErrors
Analysis
PathologicalBehaviour
Statistics
Other Science DisseminateResults
UnsuitableData
ProgramCrashes
InformDeveloper
Computational Chemistry Robots
ACS Sep 2005
Repeat runs, different methods
Multiple runs give same final structure from same input
Changing memory allocation doesn’t make a difference
Computational Chemistry Robots
ACS Sep 2005
Pathological behaviour - Early detection
100 min 631G*, B3LYP 200 min
15 min 631G*, B3LYP 10080 min
divinyl ether trans-Crotonaldehyde
Z matrix
Computational Chemistry Robots
ACS Sep 2005
Times to run jobs
0
40,000
80,000
120,000
0.E+00 5.E+08 1.E+09
(n basis functions)4
time
/ s
Computational Chemistry Robots
ACS Sep 2005
Analysis of different computational methods
Mean - Overall difference
Normality - Distribution of values
Outliers - Unusual molecules?
Variance - Spread of the data, depends
on both distributions.
(standard deviation)
Computational Chemistry Robots
ACS Sep 2005
Probability Plot (Normal QQ plot)
Computational Chemistry Robots
ACS Sep 2005
Mean of distribution(Approx - 0.03 Å)
Range over whichsample distribution is approximately normal
Outliers
Probability Plot (Normal QQ plot)S.D. 0.020 Å
Computational Chemistry Robots
ACS Sep 2005
All bonds* r (MOPAC – GAMESS) / Å
* Excludes bonds to Hydrogenc
Computational Chemistry Robots
ACS Sep 2005
All bonds* r (MOPAC – GAMESS) / Å
Good agreement
Nearly normal
Outliers
S.D. 0.005 Å
* Excludes bonds to Hydrogenc
Computational Chemistry Robots
ACS Sep 2005
NN
O
2-
Bad molecules and data usually cause outliers
Na
P
O
OH
H
Computational Chemistry Robots
ACS Sep 2005
Mean r (M - G) / Å Standard Error of the Mean / Å
C N O F S Cl
C-0.006 0.020 -0.010 -0.014 -0.040 -0.037
0.000 0.000 0.000 0.001 0.001 0.001
N 0.006 -0.037 -0.055
0.001 0.001 0.009
O -0.087 -0.070
0.004 0.014
All values given to 3 significant figures
Computational Chemistry Robots
ACS Sep 2005
r CC bonds (M - G) / Å
Computational Chemistry Robots
ACS Sep 2005
r CC bonds (M - G) / Å
Good agreement
Nearly normal Outliers
S.D. 0.013 Å
JUNK
Computational Chemistry Robots
ACS Sep 2005
Selection of molecules with C C r (M - G) > 0.05 Angstroms
CF3
OH
OH CF3
H CF3
OCF3N
H
NH2
OHOHFF
OH CHF2
O
Computational Chemistry Robots
ACS Sep 2005
Y = 0.0277 X – 0.0061
Non aromatic C C bonds adjacent to CFn
Computational Chemistry Robots
ACS Sep 2005
r NN bonds (M - G) / Å
Computational Chemistry Robots
ACS Sep 2005
Good agreement
Nearly normal
Kink
S.D. 0.022 Å
r NN bonds (M - G) / Å
Computational Chemistry Robots
ACS Sep 2005
Density plot of r NN bonds (M - G) / Å
Computational Chemistry Robots
ACS Sep 2005
LEFT
RIGHT
Density plot of r NN bonds (M - G) / Å
Computational Chemistry Robots
ACS Sep 2005
Most common fragments found in Left set but not Right set
N
NC(sp3)C(sp3)
(sp3)S(sp2)
N(ar)
N (ar)
C(sp2)
S(sp2)
N(ar)
N (ar)
C(sp2)
Or
Computational Chemistry Robots
ACS Sep 2005
GAMESS
Log Files
Comparison of theory and experiment
CIF*
CIF*
CIF*
CIF*
CIF*
CIF 2 CML
* CIF: Crystallographic Information File
Computational Chemistry Robots
ACS Sep 2005
Reading Acta Crystallographica Section E
Computational Chemistry Robots
ACS Sep 2005
All bonds* r (Cryst. – GAMESS) /Å Single molecules, no disorder
* Excludes bonds to Hydrogenc
Computational Chemistry Robots
ACS Sep 2005
All bonds* r (Cryst. – GAMESS) /Å Single molecules, no disorder
Mean r - 0.011 Å
Nearly normalOutliers
S.D. 0.014 Å
* Excludes bonds to Hydrogenc
Computational Chemistry Robots
ACS Sep 2005
r CC bonds (C – G) /Å
Computational Chemistry Robots
ACS Sep 2005Mean r- 0.01 Å
Nearly normal
S.D. 0.009 Å
r CC bonds (C – G) /Å
Computational Chemistry Robots
ACS Sep 2005
r CO bonds (C – G) /Å
Computational Chemistry Robots
ACS Sep 2005
Good agreement
Nearly normalOutliers ?
S.D. 0.011 Å
r CO bonds (C – G) /Å
Computational Chemistry Robots
ACS Sep 2005
r = +0.08 Å
Chemistry can cause outliers
H movement
Computational Chemistry Robots
ACS Sep 2005
Conclusions
• Protocols can be automated
• Machines can highlight unusual behaviour,
geometries and distribution of results for
humans to consider
•Computational programs can provide high
quality “experimental” molecular properties
Computational Chemistry Robots
ACS Sep 2005
Thanks
J.J.P. Stewart
Kim Baldridge
Wibke Sudholt
Simon Tyrrell
Yong Zhang
Peter Murray-Rust
Unilever
Computational Chemistry Robots
ACS Sep 2005
Questions
Homepage: http://wwmm.ch.cam.ac.uk
InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq
R: http:// www.r-project.org
Taverna: http://taverna.sourceforge.net/
MOPAC 2002: http://www.cachesoftware.com/mopac/
GAMESS: http:// www.msg.ameslab.gov/GAMESS/GAMESS.html