improved predictions in structure based drug design using cart and bayesian models
TRANSCRIPT
Traditional Drug Discovery (insert graph)
In Silico Prediction of ADME (insert graph)◦ Potency
◦ Absorption
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ distribution
Target IVY(Brute force virtual screening of very large compound libraries) Lead Discovery IVY(Utilize predictive models from Biogen data for more efficient virtual screening) Lead Optimization candidate
(insert graph)◦ Potency
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ Distribution
◦ absorption
Goal: Identify crystallographic binding mode, Rank order ligands wrt binding with protein
(insert graph)
Receptor Docking
Ligand Shape
Generate plausible trial binding modes using docking function then Re-rank modes with scoring function
Cell Adhesion Assay (50% Serum)◦ (insert graph)
Biochemical Adhesion Assay◦ (insert graph)
Scoring Functions Are Poor More Often Than Not
Receptor Site View Library Design FlexXScore Consensus Score>=3 e.g. Contact Map, CLogP MW, HBOND Rotatable bondsConsensus=5? if yes, substructure exists?if yes, Pharmacophore<4.2Å? if yes, Publish Hit Report
Goal: Predict hit/miss class based on presence of features (fingerprints)
Method◦ Given a set of N samples◦ Given that some subset A of them are good (‘active’)
Then we estimate for a new compound: P(good)~ A/N◦ Given a set of binary features F
For a given feature F: It appears in N samples
It appears in A good samples
Can we estimate: P(good l F)~A/N (Problem: Error gets worse as Nsmall)
◦ P’(good l F)= (A+P(good)k)/(n+k) P’(good l F)p(good)as N0 P’(good l F) A/N as N large
◦ (If K=1/P(good) this is the Laplacian correction)
Descriptors (insert) Advantages
◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead scope 27,000)
◦ Contains tertiary and stereochemistry information◦ Fast
Classification Analysis
◦ Developing Non-Linear Scoring Functions to classify actives and non-actives
◦ (insert graphs)
◦ Cost Function to Minimize: Gini Impurity N= 1-ΣP^2(ω)
Training Set Prediction Success
(insert table)
10-fold cross validation
Randomly split training and test sets
Significant Improvement in Separating Actives from Non-Actives
Improved scoring functions for separating hits from non-hits in structure-based drug design developed with CART and Bayesian models
Identified key differences in molecular physical properties that led to hits
Built reasonably predictive OBA model (cannot expect method to extend to other systems given complexity of OBA, however)