nc state lecture v2 computational toxicology

Sean Ekins, M.Sc, Ph.D., D.Sc.

Collaborations in Chemistry, Fuquay-Varina, NC.

Collaborative Drug Discovery, Burlingame, CA.School of Pharmacy, Department of Pharmaceutical

Sciences, University of Maryland. 215-687-1320

[email protected]

Computational Approaches

to Toxicology

…mathematical learning will be the distinguishing mark of a physician from a quack…

Richard Mead A mechanical account of poisons in several essays

2nd Edition, London, 1708.

• Key enablers

• What has been modeled – a quick review

• How models can be used - applications

• What will be modeled

• Future

Outline

• Computational toxicology is a broad term. It is also known as in silico toxicology, predictive toxicology.

• ‘anything that you can do with a computer in toxicology.’

• QSAR = quantitative structure activity relationship

Definitions

Consider Absorption, Distribution, Metabolism, Excretion and Toxicology properties earlier in Drug Discovery

Combine in silico, in vitro and in vivo data- Approach equally applicable to consumer products and getting information on chemicals.

Ekins et al., Trends Pharm Sci 26: 202-209 (2005)

3Rs Call for Reduced Animal Testing

Cost effective

Obtain new information that is not available using traditional methods

Rapid

Identifies toxicity early on

Less time consuming than testing

Legislation

REACH

Domestic Substances List in Canada

Chemical Substances Control List in Japan

Also interest in applying models to green chemistry

Why Should I use Why Should I use in silicoin silico Tools? Tools?

Why Use Computational Models For Toxicology?

Goal of a model – Alert you to potential toxicity, enable you to focus efforts on best molecules – reduce risk

Selection of model – trade off between interpretability, insights for modifying molecules, speed of calculation and coverage of chemistry space – applicability domain

Models can be built with proprietary, open and commercial tools

software (descriptors + algorithms) + data = model/s

Human operator decides whether a model is acceptable

In silico tools Information retrieved or predicted

Databases Records of toxicological information

Calculation of physio-chemical descriptors

Various physiochemical properties

Calculation of chemical structure-based properties

2-D and molecular orbital properties

Calculation of toxicological effects – direct prediction of endpoints

•Structural based expert systems•Multivariate based QSAR systems•Grouping or category approach

The future: crowdsourced drug discovery

Williams et al., Drug Discovery World, Winter 2009

Key enablers: Hardware is getting smaller

1930’s

1980s

1990s

Room size

Desktop size

Not to scale and not equivalent computing power – illustrates mobility

Laptop

Netbook

Phone

Watch

Key Enablers: More data available and open tools

• Details

• Details

What has been modeled

• Physicochemical properties, LogP, logD, Solubility, boiling point, melting point

• QSAR for various proteins, complex properties• Homology models, Docking• Expert systems• Hybrid methods – combine different approaches• Mutagenicity (Ames, micronucleus, clastogenicity,

and DNA damage, developmental tox.. )• Environmental Tox – Aquatic, dermatotoxicology• Mixtures

Physicochemical properties• Solubility data – 1000’s data in Literature • Models median error ~0.5 log = experimental error• LogP –tens of 1000’s data available• Fragmental or whole molecule predictors• All logP predictors are not equal. Median error ~ 0.3 log = experimental

error• People now accept solubility and LogP predictions as if real

ACD predictions + EpiSuite predictions in www.chemspider.com

• Mobile molecular data sheet

• Links to melting point predictor from open notebook science

• Required curation of data

Simple Rules

• Rule of 5

• Lipinski, Lombardo, Dominy, Feeney Adv. Drug Deliv. Rev. 23: 3-25 (1997).

• AlogP98 vs PSA• Egan, Merz, Baldwin, J. Med. Chem. 43: 3867-3877 (2000)

• Greater than ten rotatable bonds correlates with decreased rat oral bioavailability• Veber, Johnson, Cheng, Smith, Ward, Kopple. J Med Chem 45: 2515–2623, (2002)

• Compounds with ClogP < 3 and total polar surface area > 75A2 fewer animal toxicity findings.

• Hughes, et al. Bioorg Med Chem Lett 18, 4872-4875 (2008).

L. Carlsson,et al., BMC Bioinformatics 2010, 11:362

MetaPrint 2D in Bioclipse- free metabolism site predictor

Uses fingerprint descriptors and metabolite database to learn frequencies of metabolites in various substructures

QSAR for Various Proteins

• Enzymes – predominantly Cytochrome P450s - for drug-drug interactions

• Transporters – predominantly P-gp but some others e.g. OATP, BCRP -

• Receptors – PXR, CAR, for hepatotoxicity

• Ion Channels – predominantly hERG for cardiotoxicity

• Issues – initially small training sets – public data is a fraction of what drug companies have

Pharmacophores

Ideal when we have few molecules for training In silico database searching

Accelrys Catalyst in Discovery Studio

Geometric arrangement of functional groups necessary for a biological response

•Generate 3D conformations•Align molecules•Select features contributing to activity•Regress hypothesis•Evaluate with new molecules

•Excluded volumes – relate to inactive molecules

CYP2B6CYP2C9CYP2D6CYP3A4CYP3A5CYP3A7hERGP-gpOATPsOCT1OCT2BCRPhOCTN2ASBThPEPT1hPEPT2FXR LXRCARPXR etc

Interaction between hyperforin in St Johns Wort and irinotecan

= reduces efficacy

Ablating the inflammatory response mediated by exogenous toxins e.g. inflammatory diseases of the bowel

Cholesterol metabolism pathway control - a negative effect

Mediating blood-brain barrier efflux of drugs modulation of efflux transporters e.g. mdr1 and mrp2.

Decrease retention of CNS drugs e.g. anti-epileptics and pain killers, decreasing efficacy

PXR induces cell growth and is pro-carcinogenic

Growing role for PXR agonists

• DNA binding domains have high amino acid identity but LBD are divergent

• Species dependent effects on transporter and enzyme induction is due to activation of PXR and other NHRs

and mouse, rabbit, zebrafish, chicken…

Species differences in Rifampin agonism

Human, monkey, chicken, dog & Rabbit but not rat or mouse

PCN - rat but not human

Species differences in PXR

**

Maximum likelihood

NHR phylogeny

Ekins et al., BMC Evol Biol. 8(1):103 (2008)

****

*

Pharmacophore Models for PXR Evolution

• Diversity of ligands can be useful for characterization• 16 molecules tested in 6 species initially – HepG2 luciferase-based reporter

gene assay generated EC50 data

• Murideoxycholic acid• Chenodeoxycholic acid• Deoxycholic acid• Lithocholic acid• Cholic acid• 5b-cholestan-3a,7a,12a-triol• 5b-sycmnol sulphate• 5a-cyprinol sulfate• 3a,7a,12a-trihydroy-5b-cholestan-27-oic

acid taurine conjugate• Tauro-b-muricholic acid• 7a-hydroxycholesterol• 5b-pregnane-3,20-dione• benzo[a]pyrene• N-butyl-p-aminobenzoate• Nifedipine• TCDD

• Upto 4 excluded volumesEkins et al., BMC Evol Biol 8(1):103 (2008)

Human r=0.7 Zebrafish r=0.8

Mouse r=0.8

Rabbit r=0.8 Chicken r=0.7

TCDD (green) and 5-pregnane-3,20-dione (grey)

Ekins et al., BMC Evol Biol 8(1):103 (2008)

Pharmacophores show PXR evolution

Rat r=0.7

Ciona (Sea Squirt) VDR/PXR pharmacophore

• 6-formylindolo-[3,2-b]carbazole was aligned with carbamazepine and n-butyl-paminobenzoate

• Suggests planar binding site

Ligand selectivity is surprisingly species dependent

Undergone an ever expanding role in evolution from prechordates to fish to mammals and birds

Ekins et al., BMC Evol Biol. 2;8(1):103 (2008)

TCDD = 0.23M Reschly et al BMC Evol Biol 7:222 (2007)

Pharmacophores, nuclear receptors and evolution

• Statistical Methodologies– Non Linear regression– Genetic algorithms– Neural networks– Support vector machines– Recursive partitioning (trees)– Sammon maps– Bayesian methods– Kohonen maps

• A rich collection of descriptors.• Public and proprietary data.

• Problems to date – small datasets• Understanding applicability chemical space

Tools for big datasets

P-gp +ve P-gp -ve

Balakin et al.,Curr Drug Disc Technol 2:99-113, 2005.Ivanenkov, et al., Drug Disc Today, 14: 767-775, 2009.

Drug induced liver injury DILI

• Drug metabolism in the liver can convert some drugs into highly reactive intermediates,

• In turn can adversely affect the structure and functions of the liver.

• DILI, is the number one reason drugs are not approved – and also the reason some of them were withdrawn from

the market after approval• Estimated global annual incidence rate of DILI is 13.9-24.0

per 100,000 inhabitants, – and DILI accounts for an estimated 3-9% of all adverse

drug reactions reported to health authorities • Herbal components can cause DILI too

https://dilin.dcri.duke.edu/for-researchers/info/

• Drug Induced Liver Injury Models

• 74 compounds - classification models (linear discriminant analysis, artificial neural networks, and machine learning algorithms (OneR)) – Internal cross-validation (accuracy 84%, sensitivity 78%, and specificity 90%). Testing

on 6 and 13 compounds, respectively > 80% accuracy.

(Cruz-Monteagudo et al., J Comput Chem 29: 533-549, 2008).

• A second study used binary QSAR (248 active and 283 inactive) Support vector machine models – – external 5-fold cross-validation procedures and 78% accuracy for a set of 18

compounds

(Fourches et al., Chem Res Toxicol 23: 171-183, 2010).

• A third study created a knowledge base with structural alerts from 1266 chemicals. – Alerts created were used to predict results for 626 Pfizer compounds (sensitivity of

46%, specificity of 73%, and concordance of 56% for the latest version) (Greene et al., Chem Res Toxicol 23: 1215-1222, 2010).

• DILI Model - Bayesian

• Laplacian-corrected Bayesian classifier models were generated using Discovery Studio (version 2.5.5; Accelrys).

• Training set = 295, test set = 237 compounds

• Uses two-dimensional descriptors to distinguish between compounds that are DILI-positive and those that are DILI-negative

– ALogP– ECFC_6 – Apol – logD – molecular weight – number of aromatic rings – number of hydrogen bond acceptors – number of hydrogen bond donors – number of rings – number of rotatable bonds – molecular polar surface area – molecular surface area – Wiener and Zagreb indices

Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010

Extended connectivity fingerprints

• DILI Bayesian

Features in DILI -Features in DILI +

Avoid===Long aliphatic chains, Phenols, Ketones, Diols, -methyl styrene, Conjugated structures, Cyclohexenones, Amides

Test set analysis

Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010

• compounds of most interest – well known hepatotoxic drugs (U.S. Food and Drug Administration

Guidance for Industry “Drug-Induced Liver Injury: Premarketing Clinical Evaluation,” 2009), plus their less hepatotoxic comparators, if clinically available.

Fingolimod (Gilenya) for MS (EMEA and FDA)

Paliperidone for schizophrenia

Pirfenidone for Idiopathic pulmonary fibrosis

Roflumilast for pulmonary disease

Predictions for newly approved EMEA compounds

Can we get DILI data for these?

hOCTN2 – Organic Cation transporterPharmacophore

• High affinity cation/carnitine transporter - expressed in kidney, skeletal muscle, heart, placenta and small intestine

• Inhibition correlation with muscle weakness - rhabdomyolysis• A common features pharmacophore developed with 7 inhibitors• Searched a database of over 600 FDA approved drugs - selected drugs for in vitro testing. • 33 tested drugs predicted to map to the pharmacophore, 27 inhibited hOCTN2 in vitro

• Compounds were more likely to cause rhabdomyolysis if the Cmax/Ki ratio was higher than 0.0025

Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)

hOCTN2 – Organic Cation transporterPharmacophore



+ve

-ve

hOCTN2 quantitative pharmacophore and Bayesian model

Diao et al., Mol Pharm, 7: 2120-2131, 2010 r = 0.89

vinblastine

cetirizine

emetine

hOCTN2 quantitative pharmacophore and Bayesian model

Bayesian Model - Leaving 50% out 97 times external ROC 0.90internal ROC 0.79 concordance 73.4%; specificity 88.2%; sensitivity 64.2%.

Lab test set (N = 27) Bayesian model has better correct predictions (> 80%) and lower false positives and negatives than pharmacophore (> 70%)

Predictions for literature test set (N=32) not as good as in house – mean max Tanimoto similarity were ~ 0.6

Diao et al., Mol Pharm, 7: 2120-2131, 2010

PCA used to assess training and test set overlap

Among the 21 drugs associated with rhabdomyolysis or carnitinedeficiency, 14 (66.7%) provided a Cmax/Ki ratio higher than0.0025.

Among 25 drugs that were not associated with rhabdomyolysis or

carnitine deficiency, only 9 (36.0%) showed a Cmax/Ki ratio higher than

0.0025.

Rhabdomyolysis or carnitine deficiency was associated with a Cmax/Ki

value above 0.0025 (Pearson’s chi-square test p = 0.0382).

limitations of Cmax/Ki serving as a predictor for rhabdomyolysis-- Cmax/Ki does not consider the effects of drug tissue distributionor plasma protein binding.

hOCTN2 association with rhabdomyolysis

Could all pharmas share their data as models with each other?

Increasing Data & Model Access

Ekins and Williams, Lab On A Chip, 10: 13-22, 2010.

The big idea

Challenge..There is limited access to ADME/Tox data and models needed for R&D

How could a company share data but keep the structures proprietary?

Sharing models means both parties use costly software

What about open source tools? Pfizer had never considered this - So we proposed a

study and Rishi Gupta generated models

Pfizer Open models and descriptors

Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010

• What can be developed with very large training and test sets?

• HLM training 50,000 testing 25,000 molecules

• training 194,000 and testing 39,000

• MDCK training 25,000 testing 25,000

• MDR training 25,000 testing 18,400

• Open molecular descriptors / models vs commercial descriptors

• Examples – Metabolic Stability


HLM Model with CDK and SMARTS Keys:

HLM Model with MOE2D and SMARTS Keys

# Descriptors: 578 Descriptors# Training Set compounds: 193,650

Cross Validation Results: 38,730 compounds

Training R2: 0.79

20% Test Set R2: 0.69

Blind Data Set (2310 compounds): R2 = 0.53RMSE = 0.367

Continuous Categorical:κ = 0.40Sensitivity = 0.16Specificity = 0.99PPV = 0.80Time (sec/compound): 0.252

# Descriptors: 818 Descriptors# Training Set compounds: 193,930

Cross Validation Results: 38,786 compounds

Training R2: 0.77

20% Test Set R2: 0.69

Blind Data Set (2310 compounds): R2 = 0.53RMSE = 0.367

Continuous Categorical: κ = 0.42Sensitivity = 0.24Specificity = 0.987PPV = 0.823Time (sec/compound): 0.303

PCA of training (red) and test (blue) compounds

Overlap in Chemistry space

• Examples – P-gp


Open source descriptors CDK and C5.0 algorithm

~60,000 molecules with P-gp efflux data from Pfizer

MDR <2.5 (low risk) (N = 14,175) MDR > 2.5 (high risk) (N = 10,820)

Test set MDR <2.5 (N = 10,441) > 2.5 (N = 7972)

Could facilitate model sharing?

CDK +fragment descriptors MOE 2D +fragment descriptorsKappa 0.65 0.67

sensitivity 0.86 0.86specificity 0.78 0.8

PPV 0.84 0.84

Merck KGaA

Combining models may give greater coverage of ADME/ Tox chemistry space and improve predictions?

Lundbeck

Pfizer

Merck

GSK

Novartis

Lilly

BMS

Allergan Bayer

AZ

Roche BI

Merk KGaA

Model coverage of chemistry space


Converging Technologies


PathwayStudio

Pathway / Network/ Database Software Available

Ekins et al., in High Content Screening, Eds. Giuliano, Taylor & Haskin (2006)

Network of genes from rat liver slices incubated with 2.5 mM Acetaminophen for 3 hours

Olinga et al, Drug Metab Rev: 39, S1, 1-

388, 2007 .

Fibrotic response seen at 3h

Mimics in vivo

Transcription Regulator

Enzyme

Group or Complex

Kinase

Transcription Regulator Enzyme Group or Complex Kinase

Red= up regulated, Green = down regulated

Human PXR – direct downstream interactions

• PXR increases transcription of CYP3A4 and >37 other genes Transporters, drug metabolizing enzymes

47

Systems Biology

Measure

Manipulate

Model

]38[]38[

]386[

]386[]38][6[

3

2

1

nucleusexportcytosolimport

cat

rcytosolf

pphosphokpphosphokv

pMKKkv

pMKKkpMKKkv

Mine

Xu JJ, Ekins S, McGlashen M and Lauffenburger D, in Ekins S and Xu JJ, Drug Efficacy, Safety, and Biologics Discovery: Emerging Technologies and Tools, P351-379, 2009.

4M

•Make science more accessible = >communication

•Mobile – take a phone into field /lab and do science more readily than on a laptop

•GREEN – energy efficient computing

•MolSync + DropBox + MMDS = Share molecules as SDF files on the cloud = collaborate

Mobile Apps for Drug Discovery

Williams et al DDT 16:928-939, 2011

Green solvents App

Green Solvents App

Bad Good

www.scimobileapps.com

Mobile Apps for Drug Discovery

Clark et al., submitted 2011

Future: What will be modeled

• Mitochondrial toxicity, hepatotoxicity, • More Transporters – MATE, OATPs, BSEP..bigger datasets – driven by

academia• Screening centers – more data – more models • Understanding differences between ligands for Nuclear Receptors

– CAR vs PXR

• Models will become replacements for data as datasets expand (e.g. like logP)

• Toxicity Models used for Green Chemistry

Chem Rev. 2010 Oct 13;110(10):5845-82

How Could Green Chemistry Benefit From These Models?

Chem Rev. 2010 Oct 13;110(10):5845-82

…

Nature 469, 6 Jan 2011

Acknowledgments• Sneha Bhatia RIFM• Lei Diao & James E. Polli University of Maryland• Rishi Gupta, Eric Gifford,Ted Liston, Chris

Waller – pfizer• Jim Xu – Merck• Matthew D. Krasowski, Erica J. Reschly,

Manisha Iyer, (University of Iowa) • Seth Kullman et al: (NC State)• Andrew Fidler (NZ)• Sandhya Kortagere (Drexel University)• Peter Olinga (Groningen University)• Dana Abramowitz (Ingenuity)• Antony J. Williams (RSC)• Alex Clark

• Accelrys• CDD• Ingenuity

• Email: [email protected]

Slideshare: http://www.slideshare.net/ekinssean

Twitter: collabchem

Blog: http://www.collabchem.com/

Website: http://www.collaborations.com/CHEMISTRY.HTM

nc state lecture v2 computational toxicology

Health & Medicine

data approach

curation of data

computational models

drug discovery world

collaborative drug discovery

s data available fragmental

drug deliv

drug companies