nc state lecture v2 computational toxicology
DESCRIPTION
NC state lecture jan 2012TRANSCRIPT
Sean Ekins, M.Sc, Ph.D., D.Sc.
Collaborations in Chemistry, Fuquay-Varina, NC.
Collaborative Drug Discovery, Burlingame, CA.School of Pharmacy, Department of Pharmaceutical
Sciences, University of Maryland. 215-687-1320
Computational Approaches
to Toxicology
…mathematical learning will be the distinguishing mark of a physician from a quack…
Richard Mead A mechanical account of poisons in several essays
2nd Edition, London, 1708.
• Key enablers
• What has been modeled – a quick review
• How models can be used - applications
• What will be modeled
• Future
Outline
• Computational toxicology is a broad term. It is also known as in silico toxicology, predictive toxicology.
• ‘anything that you can do with a computer in toxicology.’
• QSAR = quantitative structure activity relationship
Definitions
Consider Absorption, Distribution, Metabolism, Excretion and Toxicology properties earlier in Drug Discovery
Combine in silico, in vitro and in vivo data- Approach equally applicable to consumer products and getting information on chemicals.
Ekins et al., Trends Pharm Sci 26: 202-209 (2005)
3Rs Call for Reduced Animal Testing
Cost effective
Obtain new information that is not available using traditional methods
Rapid
Identifies toxicity early on
Less time consuming than testing
Legislation
REACH
Domestic Substances List in Canada
Chemical Substances Control List in Japan
Also interest in applying models to green chemistry
Why Should I use Why Should I use in silicoin silico Tools? Tools?
Why Use Computational Models For Toxicology?
Goal of a model – Alert you to potential toxicity, enable you to focus efforts on best molecules – reduce risk
Selection of model – trade off between interpretability, insights for modifying molecules, speed of calculation and coverage of chemistry space – applicability domain
Models can be built with proprietary, open and commercial tools
software (descriptors + algorithms) + data = model/s
Human operator decides whether a model is acceptable
In silico tools Information retrieved or predicted
Databases Records of toxicological information
Calculation of physio-chemical descriptors
Various physiochemical properties
Calculation of chemical structure-based properties
2-D and molecular orbital properties
Calculation of toxicological effects – direct prediction of endpoints
•Structural based expert systems•Multivariate based QSAR systems•Grouping or category approach
The future: crowdsourced drug discovery
Williams et al., Drug Discovery World, Winter 2009
Key enablers: Hardware is getting smaller
1930’s
1980s
1990s
Room size
Desktop size
Not to scale and not equivalent computing power – illustrates mobility
Laptop
Netbook
Phone
Watch
Key Enablers: More data available and open tools
• Details
• Details
What has been modeled
• Physicochemical properties, LogP, logD, Solubility, boiling point, melting point
• QSAR for various proteins, complex properties• Homology models, Docking• Expert systems• Hybrid methods – combine different approaches• Mutagenicity (Ames, micronucleus, clastogenicity,
and DNA damage, developmental tox.. )• Environmental Tox – Aquatic, dermatotoxicology• Mixtures
Physicochemical properties• Solubility data – 1000’s data in Literature • Models median error ~0.5 log = experimental error• LogP –tens of 1000’s data available• Fragmental or whole molecule predictors• All logP predictors are not equal. Median error ~ 0.3 log = experimental
error• People now accept solubility and LogP predictions as if real
ACD predictions + EpiSuite predictions in www.chemspider.com
• Mobile molecular data sheet
• Links to melting point predictor from open notebook science
• Required curation of data
Simple Rules
• Rule of 5
• Lipinski, Lombardo, Dominy, Feeney Adv. Drug Deliv. Rev. 23: 3-25 (1997).
• AlogP98 vs PSA• Egan, Merz, Baldwin, J. Med. Chem. 43: 3867-3877 (2000)
• Greater than ten rotatable bonds correlates with decreased rat oral bioavailability• Veber, Johnson, Cheng, Smith, Ward, Kopple. J Med Chem 45: 2515–2623, (2002)
• Compounds with ClogP < 3 and total polar surface area > 75A2 fewer animal toxicity findings.
• Hughes, et al. Bioorg Med Chem Lett 18, 4872-4875 (2008).
L. Carlsson,et al., BMC Bioinformatics 2010, 11:362
MetaPrint 2D in Bioclipse- free metabolism site predictor
Uses fingerprint descriptors and metabolite database to learn frequencies of metabolites in various substructures
QSAR for Various Proteins
• Enzymes – predominantly Cytochrome P450s - for drug-drug interactions
• Transporters – predominantly P-gp but some others e.g. OATP, BCRP -
• Receptors – PXR, CAR, for hepatotoxicity
• Ion Channels – predominantly hERG for cardiotoxicity
• Issues – initially small training sets – public data is a fraction of what drug companies have
Pharmacophores
Ideal when we have few molecules for training In silico database searching
Accelrys Catalyst in Discovery Studio
Geometric arrangement of functional groups necessary for a biological response
•Generate 3D conformations•Align molecules•Select features contributing to activity•Regress hypothesis•Evaluate with new molecules
•Excluded volumes – relate to inactive molecules
CYP2B6CYP2C9CYP2D6CYP3A4CYP3A5CYP3A7hERGP-gpOATPsOCT1OCT2BCRPhOCTN2ASBThPEPT1hPEPT2FXR LXRCARPXR etc
Interaction between hyperforin in St Johns Wort and irinotecan
= reduces efficacy
Ablating the inflammatory response mediated by exogenous toxins e.g. inflammatory diseases of the bowel
Cholesterol metabolism pathway control - a negative effect
Mediating blood-brain barrier efflux of drugs modulation of efflux transporters e.g. mdr1 and mrp2.
Decrease retention of CNS drugs e.g. anti-epileptics and pain killers, decreasing efficacy
PXR induces cell growth and is pro-carcinogenic
Growing role for PXR agonists
• DNA binding domains have high amino acid identity but LBD are divergent
• Species dependent effects on transporter and enzyme induction is due to activation of PXR and other NHRs
and mouse, rabbit, zebrafish, chicken…
Species differences in Rifampin agonism
Human, monkey, chicken, dog & Rabbit but not rat or mouse
PCN - rat but not human
Species differences in PXR
**
Maximum likelihood
NHR phylogeny
Ekins et al., BMC Evol Biol. 8(1):103 (2008)
****
*
Pharmacophore Models for PXR Evolution
• Diversity of ligands can be useful for characterization• 16 molecules tested in 6 species initially – HepG2 luciferase-based reporter
gene assay generated EC50 data
• Murideoxycholic acid• Chenodeoxycholic acid• Deoxycholic acid• Lithocholic acid• Cholic acid• 5b-cholestan-3a,7a,12a-triol• 5b-sycmnol sulphate• 5a-cyprinol sulfate• 3a,7a,12a-trihydroy-5b-cholestan-27-oic
acid taurine conjugate• Tauro-b-muricholic acid• 7a-hydroxycholesterol• 5b-pregnane-3,20-dione• benzo[a]pyrene• N-butyl-p-aminobenzoate• Nifedipine• TCDD
• Upto 4 excluded volumesEkins et al., BMC Evol Biol 8(1):103 (2008)
Human r=0.7 Zebrafish r=0.8
Mouse r=0.8
Rabbit r=0.8 Chicken r=0.7
TCDD (green) and 5-pregnane-3,20-dione (grey)
Ekins et al., BMC Evol Biol 8(1):103 (2008)
Pharmacophores show PXR evolution
Rat r=0.7
Ciona (Sea Squirt) VDR/PXR pharmacophore
• 6-formylindolo-[3,2-b]carbazole was aligned with carbamazepine and n-butyl-paminobenzoate
• Suggests planar binding site
Ligand selectivity is surprisingly species dependent
Undergone an ever expanding role in evolution from prechordates to fish to mammals and birds
Ekins et al., BMC Evol Biol. 2;8(1):103 (2008)
TCDD = 0.23M Reschly et al BMC Evol Biol 7:222 (2007)
Pharmacophores, nuclear receptors and evolution
• Statistical Methodologies– Non Linear regression– Genetic algorithms– Neural networks– Support vector machines– Recursive partitioning (trees)– Sammon maps– Bayesian methods– Kohonen maps
• A rich collection of descriptors.• Public and proprietary data.
• Problems to date – small datasets• Understanding applicability chemical space
Tools for big datasets
P-gp +ve P-gp -ve
Balakin et al.,Curr Drug Disc Technol 2:99-113, 2005.Ivanenkov, et al., Drug Disc Today, 14: 767-775, 2009.
Drug induced liver injury DILI
• Drug metabolism in the liver can convert some drugs into highly reactive intermediates,
• In turn can adversely affect the structure and functions of the liver.
• DILI, is the number one reason drugs are not approved – and also the reason some of them were withdrawn from
the market after approval• Estimated global annual incidence rate of DILI is 13.9-24.0
per 100,000 inhabitants, – and DILI accounts for an estimated 3-9% of all adverse
drug reactions reported to health authorities • Herbal components can cause DILI too
https://dilin.dcri.duke.edu/for-researchers/info/
• Drug Induced Liver Injury Models
• 74 compounds - classification models (linear discriminant analysis, artificial neural networks, and machine learning algorithms (OneR)) – Internal cross-validation (accuracy 84%, sensitivity 78%, and specificity 90%). Testing
on 6 and 13 compounds, respectively > 80% accuracy.
(Cruz-Monteagudo et al., J Comput Chem 29: 533-549, 2008).
• A second study used binary QSAR (248 active and 283 inactive) Support vector machine models – – external 5-fold cross-validation procedures and 78% accuracy for a set of 18
compounds
(Fourches et al., Chem Res Toxicol 23: 171-183, 2010).
• A third study created a knowledge base with structural alerts from 1266 chemicals. – Alerts created were used to predict results for 626 Pfizer compounds (sensitivity of
46%, specificity of 73%, and concordance of 56% for the latest version) (Greene et al., Chem Res Toxicol 23: 1215-1222, 2010).
• DILI Model - Bayesian
• Laplacian-corrected Bayesian classifier models were generated using Discovery Studio (version 2.5.5; Accelrys).
• Training set = 295, test set = 237 compounds
• Uses two-dimensional descriptors to distinguish between compounds that are DILI-positive and those that are DILI-negative
– ALogP– ECFC_6 – Apol – logD – molecular weight – number of aromatic rings – number of hydrogen bond acceptors – number of hydrogen bond donors – number of rings – number of rotatable bonds – molecular polar surface area – molecular surface area – Wiener and Zagreb indices
Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
Extended connectivity fingerprints
• DILI Bayesian
Features in DILI -Features in DILI +
Avoid===Long aliphatic chains, Phenols, Ketones, Diols, -methyl styrene, Conjugated structures, Cyclohexenones, Amides
Test set analysis
Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
• compounds of most interest – well known hepatotoxic drugs (U.S. Food and Drug Administration
Guidance for Industry “Drug-Induced Liver Injury: Premarketing Clinical Evaluation,” 2009), plus their less hepatotoxic comparators, if clinically available.
Fingolimod (Gilenya) for MS (EMEA and FDA)
Paliperidone for schizophrenia
Pirfenidone for Idiopathic pulmonary fibrosis
Roflumilast for pulmonary disease
Predictions for newly approved EMEA compounds
Can we get DILI data for these?
hOCTN2 – Organic Cation transporterPharmacophore
• High affinity cation/carnitine transporter - expressed in kidney, skeletal muscle, heart, placenta and small intestine
• Inhibition correlation with muscle weakness - rhabdomyolysis• A common features pharmacophore developed with 7 inhibitors• Searched a database of over 600 FDA approved drugs - selected drugs for in vitro testing. • 33 tested drugs predicted to map to the pharmacophore, 27 inhibited hOCTN2 in vitro
• Compounds were more likely to cause rhabdomyolysis if the Cmax/Ki ratio was higher than 0.0025
Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)
hOCTN2 – Organic Cation transporterPharmacophore
Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)
Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)
+ve
-ve
hOCTN2 quantitative pharmacophore and Bayesian model
Diao et al., Mol Pharm, 7: 2120-2131, 2010 r = 0.89
vinblastine
cetirizine
emetine
hOCTN2 quantitative pharmacophore and Bayesian model
Bayesian Model - Leaving 50% out 97 times external ROC 0.90internal ROC 0.79 concordance 73.4%; specificity 88.2%; sensitivity 64.2%.
Lab test set (N = 27) Bayesian model has better correct predictions (> 80%) and lower false positives and negatives than pharmacophore (> 70%)
Predictions for literature test set (N=32) not as good as in house – mean max Tanimoto similarity were ~ 0.6
Diao et al., Mol Pharm, 7: 2120-2131, 2010
PCA used to assess training and test set overlap
Among the 21 drugs associated with rhabdomyolysis or carnitinedeficiency, 14 (66.7%) provided a Cmax/Ki ratio higher than0.0025.
Among 25 drugs that were not associated with rhabdomyolysis or
carnitine deficiency, only 9 (36.0%) showed a Cmax/Ki ratio higher than
0.0025.
Rhabdomyolysis or carnitine deficiency was associated with a Cmax/Ki
value above 0.0025 (Pearson’s chi-square test p = 0.0382).
limitations of Cmax/Ki serving as a predictor for rhabdomyolysis-- Cmax/Ki does not consider the effects of drug tissue distributionor plasma protein binding.
hOCTN2 association with rhabdomyolysis
Could all pharmas share their data as models with each other?
Increasing Data & Model Access
Ekins and Williams, Lab On A Chip, 10: 13-22, 2010.
The big idea
Challenge..There is limited access to ADME/Tox data and models needed for R&D
How could a company share data but keep the structures proprietary?
Sharing models means both parties use costly software
What about open source tools? Pfizer had never considered this - So we proposed a
study and Rishi Gupta generated models
Pfizer Open models and descriptors
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
• What can be developed with very large training and test sets?
• HLM training 50,000 testing 25,000 molecules
• training 194,000 and testing 39,000
• MDCK training 25,000 testing 25,000
• MDR training 25,000 testing 18,400
• Open molecular descriptors / models vs commercial descriptors
• Examples – Metabolic Stability
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
HLM Model with CDK and SMARTS Keys:
HLM Model with MOE2D and SMARTS Keys
# Descriptors: 578 Descriptors# Training Set compounds: 193,650
Cross Validation Results: 38,730 compounds
Training R2: 0.79
20% Test Set R2: 0.69
Blind Data Set (2310 compounds): R2 = 0.53RMSE = 0.367
Continuous Categorical:κ = 0.40Sensitivity = 0.16Specificity = 0.99PPV = 0.80Time (sec/compound): 0.252
# Descriptors: 818 Descriptors# Training Set compounds: 193,930
Cross Validation Results: 38,786 compounds
Training R2: 0.77
20% Test Set R2: 0.69
Blind Data Set (2310 compounds): R2 = 0.53RMSE = 0.367
Continuous Categorical: κ = 0.42Sensitivity = 0.24Specificity = 0.987PPV = 0.823Time (sec/compound): 0.303
PCA of training (red) and test (blue) compounds
Overlap in Chemistry space
• Examples – P-gp
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
Open source descriptors CDK and C5.0 algorithm
~60,000 molecules with P-gp efflux data from Pfizer
MDR <2.5 (low risk) (N = 14,175) MDR > 2.5 (high risk) (N = 10,820)
Test set MDR <2.5 (N = 10,441) > 2.5 (N = 7972)
Could facilitate model sharing?
CDK +fragment descriptors MOE 2D +fragment descriptorsKappa 0.65 0.67
sensitivity 0.86 0.86specificity 0.78 0.8
PPV 0.84 0.84
Merck KGaA
Combining models may give greater coverage of ADME/ Tox chemistry space and improve predictions?
Lundbeck
Pfizer
Merck
GSK
Novartis
Lilly
BMS
Allergan Bayer
AZ
Roche BI
Merk KGaA
Model coverage of chemistry space
Ekins et al., Trends Pharm Sci 26: 202-209 (2005)
Converging Technologies
Ekins et al., Trends Pharm Sci 26: 202-209 (2005)
PathwayStudio
Pathway / Network/ Database Software Available
Ekins et al., in High Content Screening, Eds. Giuliano, Taylor & Haskin (2006)
Network of genes from rat liver slices incubated with 2.5 mM Acetaminophen for 3 hours
Olinga et al, Drug Metab Rev: 39, S1, 1-
388, 2007 .
Fibrotic response seen at 3h
Mimics in vivo
Transcription Regulator
Enzyme
Group or Complex
Kinase
Transcription Regulator Enzyme Group or Complex Kinase
Red= up regulated, Green = down regulated
Human PXR – direct downstream interactions
• PXR increases transcription of CYP3A4 and >37 other genes Transporters, drug metabolizing enzymes
47
Systems Biology
Measure
Manipulate
Model
]38[]38[
]386[
]386[]38][6[
3
2
1
nucleusexportcytosolimport
cat
rcytosolf
pphosphokpphosphokv
pMKKkv
pMKKkpMKKkv
Mine
Xu JJ, Ekins S, McGlashen M and Lauffenburger D, in Ekins S and Xu JJ, Drug Efficacy, Safety, and Biologics Discovery: Emerging Technologies and Tools, P351-379, 2009.
4M
•Make science more accessible = >communication
•Mobile – take a phone into field /lab and do science more readily than on a laptop
•GREEN – energy efficient computing
•MolSync + DropBox + MMDS = Share molecules as SDF files on the cloud = collaborate
Mobile Apps for Drug Discovery
Williams et al DDT 16:928-939, 2011
Green solvents App
Green Solvents App
Bad Good
www.scimobileapps.com
Mobile Apps for Drug Discovery
Clark et al., submitted 2011
Future: What will be modeled
• Mitochondrial toxicity, hepatotoxicity, • More Transporters – MATE, OATPs, BSEP..bigger datasets – driven by
academia• Screening centers – more data – more models • Understanding differences between ligands for Nuclear Receptors
– CAR vs PXR
• Models will become replacements for data as datasets expand (e.g. like logP)
• Toxicity Models used for Green Chemistry
Chem Rev. 2010 Oct 13;110(10):5845-82
How Could Green Chemistry Benefit From These Models?
Chem Rev. 2010 Oct 13;110(10):5845-82
…
Nature 469, 6 Jan 2011
Acknowledgments• Sneha Bhatia RIFM• Lei Diao & James E. Polli University of Maryland• Rishi Gupta, Eric Gifford,Ted Liston, Chris
Waller – pfizer• Jim Xu – Merck• Matthew D. Krasowski, Erica J. Reschly,
Manisha Iyer, (University of Iowa) • Seth Kullman et al: (NC State)• Andrew Fidler (NZ)• Sandhya Kortagere (Drexel University)• Peter Olinga (Groningen University)• Dana Abramowitz (Ingenuity)• Antony J. Williams (RSC)• Alex Clark
• Accelrys• CDD• Ingenuity
• Email: [email protected]
Slideshare: http://www.slideshare.net/ekinssean
Twitter: collabchem
Blog: http://www.collabchem.com/
Website: http://www.collaborations.com/CHEMISTRY.HTM