quantum chemical and machine learning approaches to property prediction for druglike molecules dr...

57
Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Upload: adam-hall

Post on 18-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Quantum Chemical and Machine Learning Approaches to Property Prediction for

Druglike Molecules

Dr John MitchellUniversity of St Andrews

Page 2: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

1. Solubility is an important issue in drug discovery and a major source of attrition

This is expensive for the pharma industry

A good model for predicting the solubility of druglike molecules would be very valuable.

Page 3: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

How should we approach the

prediction/estimation/calculation

of the aqueous solubility of

druglike molecules?

Two (apparently) fundamentally different approaches: theoretical chemistry & informatics.

Page 4: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Theoretical Chemistry

• Calculations and simulations based on real physics.

• Calculations are either quantum mechanical or use parameters derived from quantum mechanics.

• Attempt to model or simulate reality.

• Usually Low Throughput.

Page 5: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews
Page 6: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Dataset

•The thermodynamically most stable polymorph was selected where possible. •All have experimental crystal structures.

•All have experimental logS.

•10 have experimental ΔGsub and ΔGhydr (circled in red).

Page 7: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

CheqSol MethodCheqSol Method

NH

O

Cl Cl

ONa NH

OCl Cl

O

Na+

NH

OCl Cl

OH

NH

OCl Cl

OH

In SolutionPowder

● We continue “Chasing equilibrium” until a specified number of crossing points have been reached ● A crossing point represents the moment when the solution switches from a saturated solution to a subsaturated solution; no change in pH, gradient zero, no re-dissolving nor precipitating….

SOLUTION IS IN EQUILIBRIUM

Repeatability better than 0.05 log units

dpH/dt Versus Time

dpH

/dt

Time (minutes)

-0.008

-0.004

0.000

0.004

0.008

20 25 30 35 40 45

Supersaturated Solution

Subsaturated Solution8 Intrinsic solubility values8 Intrinsic solubility values

* A. Llinàs, J. C. Burley, K. J. Box, R. C. Glen and J. M. Goodman. Diclofenac solubility: independent determination of the intrinsic solubility of three crystal forms. J. Med. Chem. 2007, 50(5), 979-983

● First precipitation – Kinetic Solubility (Not in Equilibrium)● Thermodynamic Solubility through “Chasing Equilibrium”- Intrinsic Solubility (In Equilibrium)

Supersaturation Factor SSF = Skin – S0

“CheqSol”

Page 8: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Thermodynamic Cycle

Page 9: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Thermodynamic Cycle

Crystal

Gas

Solution

Page 10: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Sublimation Free Energy

Crystal

Gas

Page 11: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Sublimation Free Energy

Crystal

Gas

(rigid molecule approximation)

Page 12: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Sublimation Free Energy

Crystal

Gas

Calculating ΔGsub is a standard procedure in crystal structure prediction

Page 13: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Gsub from lattice energy & a phonon entropy term;DMACRYS using B3LYP/6-31G** multipoles and FIT repulsion-dispersion potential.

Theoretical method for crystal

Page 14: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Lattice energies from DMACRYS with FIT atom-atom model potential and B3LYP/6-31G(d,p) distributed multipoles.

Results for ΔGsub

Page 15: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

A 46 compound set has a larger error, mostly due to some large outliers. Error statistics vary with dataset.

Page 16: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews
Page 17: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews
Page 18: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Thermodynamic Cycle

Crystal

Gas

Solution

Page 19: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Hydration Free Energy

We expected that hydration would be harder to model than sublimation, because the solution has an inexactly known and dynamic structure, both solute and solvent are important etc.

Page 20: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Ghydr from Reference Interaction Site Model with Universal Correction (3DRISM-KH/UC).

Theoretical method for aqueous solution

Page 21: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Reference Interaction Site Model (RISM)

• Combines features of explicit and implicit solvent models.

• Solvent density is modelled, but no explicit molecular coordinates or dynamics.

~45 CPU minsper compound

Page 22: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Reference Interaction Site Model (RISM)

Palmer, D.S., et al., Accurate calculations of the hydration free energies of druglike molecules using the Reference Interaction Site Model. Journal of Chemical Physics, 2010. 133(4): p. 044104-11.

Page 23: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Perhaps surprisingly, error in Ghyd is smaller than in Gsub.

Results for ΔGhyd

Page 24: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

logS from Thermodynamic Cycle

Crystal

Gas

Solution

Add the two terms to get ΔGsol and hence logS.

Page 25: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Results for ΔGsol

Page 26: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Conclusions: Solubility from Theory

• Must calculate Gsub & Ghyd separately;

• RISM is efficient & fairly accurate for Ghyd;

• Experimental data for Gsub & Ghyd sparse and errors may be large;

• Dataset size and composition make comparisons of methods hard;

• Not yet matched accuracy of informatics.

Page 27: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Informatics and Empirical Models

• In general, informatics methods represent phenomena mathematically, but not in a physics-based way.

• Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model.

• Do not attempt to simulate reality. • Usually High Throughput.

Page 28: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

What Error is Acceptable?

• For typically diverse sets of druglike molecules, a “good” QSPR will have an RMSE ≈ 0.7 logS units.

• A RMSE > 1.0 logS unit is probably unacceptable.

• This corresponds to an error range of 4.0 to 5.7 kJ/mol in Gsol.

Page 29: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

What Error is Acceptable?

• A useless model would have an RMSE close to the SD of the test set logS values: ~ 1.4 logS units;

• The best possible model would have an RMSE close to the SD resulting from the experimental error in the underlying data: ~ 0.5 logS units?

Page 30: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Machine Learning Method

Random Forest

Page 31: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Random Forest: Solubility Results

RMSE(te)=0.69r2(te)=0.89Bias(te)=-0.04

RMSE(oob)=0.68r2(oob)=0.90Bias(oob)=0.01

DS Palmer et al., J. Chem. Inf. Model., 47, 150-158 (2007) Ntrain = 658; Ntest = 300

Page 32: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Support Vector Machine

Page 33: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

SVM: Solubility Results

et al.,

Ntrain = 150 + 50; Ntest = 87RMSE(te)=0.94r2(te)=0.79

Page 34: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

100 Compound Cross-Validation

Theoretical energies don’t seem to improve descriptor models.

Page 35: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Replicating Solubility Challenge (post hoc)

RMSE(te)=1.09; 1.00; 0.89; 1.08r2(te)=0.39; 0.49; 0.58; 0.4110; 12; 12; 13/28 correct within 0.5 logS units Ntrain 94; Ntest 28

CDK descriptors: RF, RF, PLS, SVM

Page 36: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Replicating Solubility Challenge (post hoc)

Ntrain 94; Ntest 28

CDK descriptors: RF, RF, PLS, SVM

Although the test dataset is small, it is a standard set.

Page 37: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Conclusions: Solubility from Informatics

• Experimental data: errors unknown, but limit possible accuracy of models;

• CheqSol - step in right direction; • Dataset size and composition hinder comparisons

of methods; • Solubility Challenge – step in right direction.

Page 38: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

2. Protein Target Prediction

• Which protein does a given molecule bind to?• Virtual Screening• Multiple endpoint drugs - polypharmacology• New targets for existing drugs• Prediction of adverse drug reactions (ADR)

– Computational toxicology

Page 39: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Predicted Protein Targets

• Selection of 233 classes from the MDL Drug Data Report

• ~90,000 molecules• 15 independent

50%/50% splits into training/test set

Actually we are predicting closely target-related MDDR classes

Page 40: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Predicted Protein Targets

Cumulative probability of correct prediction within the three top-ranking predictions: 82.1% (±0.5%)

Page 41: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Protein Target Prediction• Given a specific compound, is it possible to predict

computationally its biological interactions with protein targets?• Very important for

• In silico screening (time and money efficient)• off-target prediction (side effects)

• Can be used for identifying substances with performance-enhancing potential

Drug discovery: Predicting promiscuity, Andrew L. Hopkins, Nature 462, 167-168(12 November 2009),doi:10.1038/462167a

James
The template of the slides changes here and it looks odd to have such a sudden transition.
Page 42: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Substances Prohibited in Sports

• WADA publishes and maintains a prohibited list of banned compounds, updated every 6 months

• Substances are split into three main categories:

Substances prohibited at all times(in and out of competition)

S0. Non-Approved substancesS1. Anabolic AgentsS2. Peptide hormones, GrowthFactors and Related SubstancesS3. Beta-2 AgonistsS4. Hormone Antagonists andModulatorsS5. Diuretics and Other MaskingAgents

Substances prohibited in competitionS6. StimulantsS7. NarcoticsS8. CannabinoidsS9. Glucocorticosteroids

Substances prohibited in particularsports

P1. Alcohol with a violation thresholdof 0.10 g/L. (Archery, Karate etc)P2. Beta-Blockers prohibited In-Competition only (Bridge, Curling,Darts, Wrestling, Archery etc.)

Page 43: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Database

Rank

1

2

3

Class

Anabolic Agents

VitaminD

Glucocorticoids

MethodologyStanozolol

Page 44: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

ChEMBL-Activities

• Each compound hasexperimental data for anumber of targets

• Activity data based on IC50,EC50, Ki, Kd etc.

• Some activities just labelled“inactive” or “active”

• Each compound can havemore than one record for agiven target

Page 45: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

• Each of the 8,845 targets has anumber of compounds assigned

• Not all compounds have actualdata on the target or are active

• We filtered each of the familiesaccording to rules defining“active” and “inactive”

• The rules were decided byvisual inspection of distributions

• Rules• IC50

• ≤50000nM active & >50000nM inactive

• Ki

• <20000nM active & ≥20000nM inactive

• Kd

• ≤ 10000nM active & >10000nM inactive

• EC50• ≤ 40000nM active & >40000nM inactive

• ED50• ≤ 10000nM active & >10000nM inactive

• Potency• ≤ 10000nM active & >10000nM inactive

• Activity• ≥40% active & <40% inactive

• Inhibition• ≥45% active & <45% inactive

Filtering the CheMBLFamilies

Page 46: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Index Index Index

Example: Distributions of Ki

Page 47: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

???

Refined Families

• Filtered families consist of compoundswith significant experimental activitiesagainst the relevant targets.

• Many targets have distinct groupsof ligands with different scaffolds.

• May be because there is more thanone binding site, or becausedifferent scaffolds can fit the same site.

• Splitting such a family into smallergroups based on ligand structure willallow us to identify the different setsof ligands.

Page 48: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Refined Families - PFClust

We selected the PFClust algorithm because it is aparameter free clustering algorithm and does notrequire any kind of parameter tuning.

PFClust : A novel parameter free clustering algorithm. Mavridis L, Nath N, Mitchell JBO. BMC Bioinformatics 2013, 14:213.

Page 49: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

• 3563• 783690

RuleFiltering

• 19639• 616600

ClusteringDatabase Database

RefinedFamilies

Compounds Compounds5443Families1366460Compounds

Predicting the protein targets for athletic performance-enhancing substances. Mavridis L, Mitchell JBO. J Cheminformatics2013, 5:31.

Database Refinement

Original

Families

Page 50: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Database Refinement - Validation• Monte Carlo Cross-Validation

• The three versions of the databasewere examined (Original, Filtered andRefined)

• 10% of each family were randomlyremoved and used as queries

• If the top prediction was the familythat the query was a member of, a TPwould be counted; if not, a FP

• Average Matthews CorrelationCoefficient (MCC)

• Original : 0.02• Filtered : 0.03• Refined : 0.66

2.58% (6.61%)

66.98% (87.25%)

3.18% (7.21%)

Top Hit (Top four )

Page 51: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

P2–BetaBlockers

20 explicitly prohibitedcompounds

Every compound, excepttimolol and levobunolol,gave a strong prediction(PR-Score) for at least onefamily

Good experimentalvalidation

We see that the majority of

the families are Beta-1,2 &3 adrenergic receptorligands, as expected.

Other families alsogenerate some interestingresults, such as theserotonin 1a receptor,indicated to make off-targetinteractions with pindolol

Compound Target PR-Score E-Value

P2-Beta Blockers

Alprenolol (266195) Cavia Porceullus (369) 0.039 LogB/F = −0.158

Carvedilol (723) β-1 adrenergic receptor (3252) 0.032 Ki = 0.81 nM

β-2 adrenergic receptor (210) 0.044 Ki = 0.166 nM

Pindolol (500)

β-2 adrenergic receptor (3754) 0.047

β-3 adrenergic receptor (4031) 0.036

β-1 adrenergic receptor (3252) 0.017

Prediction

Prediction

Ki = 1 nM

β-2 adrenergic receptor (210) 0.015 Ki = 0.4 nM

β-2 adrenergic receptor (3754) 0.026

β-3 adrenergic receptor (4031) 0.018

Inhibition = 84%

Ki = 1 nM

Serotonin 1a (5-HT1a (214) 0.026 Ki = 24 nM

Propranolol (27)

Sotalol (471)

β-2 adrenergic receptor (210)

β-3 adrenergic receptor (246)

0.003

0.009

IC50 = 12 nM

IC50 = 7200 nM

Page 52: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Carvedilol

WADA– P2 Beta Blockers

Metoprolol

Page 53: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Compound Target PR-Score E-ValueS8-Cannabinoids

Cannabidivarin (−)Cannabinoid CB1 receptor (218) 0.037 Prediction

Cannabigerol (497318)

Cannabinoid CB2 receptor (253)

HL-60 (383)

0.037

0.047

Prediction

Prediction

HU-210 (70625)

JWH-018 (561013)

Cannabinoid CB1 receptor (3571)

Cannabinoid CB2 receptor (5373)

Cannabinoid CB1 receptor (218)

Cannabinoid CB1 receptor (3571)

Cannabinoid CB2 receptor (253)

0.035

0.029

0.002

0.015

0.009

Ki = 0.82 nMa

Prediction

pKi = 8.7

pKi = 8.045

pKi = 8.2

JWH-073 (−)

Isoprenylcysteine carboxylmethyltransferase (4699)

MDA-MB-231 (400)

Cannabinoid CB1 receptor (218)

0.031

0.030

0.002

Prediction

Prediction

Prediction

Cannabinoid CB1 receptor (3571) 0.025 Prediction

Tetrahydrocannabinol(465)

Cannabinoid CB1 receptor (218) 0.037 Ki = 2.9 nM

Cannabinoid CB1 receptor (3571)

Cannabinoid CB2 receptor (2470)

Cannabinoid CB2 receptor (253)

Cannabinoid CB2 receptor (5373)

0.037

0.034

0.033

0.049

Ki = 37 nM

Ki = 20 nM

Ki = 3.3 nM

Ki = 9.2 nM

S8-Cannabinoids

10 explicitly prohibitedcompounds

17 refined families ofwhich 13 arecannabinoid CB1/2receptors

All compounds showstrong predicted affinityto at least onecannabinoid receptor,except cannabivarol

Excellent agreementbetween PR-scores andexperimental results

Page 54: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Tetrahydrocannabinol

HU-210

WADA– S8 Cannabinoids

JWH-018/073

Page 55: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Discussion• As for any method, the success of our approach depends on

the quality of the underlying data.

• Our methodology addresses the problem that only a smallfraction of possible activities of different molecules against

different targets have been assayed.

• For ChEMBL families that are not well populated, or for proteintargets which too few compounds are assayed against, wecannot make predictions.

Page 56: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Conclusions – Target Prediction• Automated data curation of the ChEMBL families greatly

increases the precision of our protein target predictions.

• Our validations show good correspondence withexperiment, 87% having the correct refined familyamong the top four hits.

• Across the seven WADA classes considered, we find acombination of expected and unexpected protein targets.

• Many of the non-obvious predicted targets have biochemically or clinically validated connections with the expected bioactivities.

Page 57: Quantum Chemical and Machine Learning Approaches to Property Prediction for Druglike Molecules Dr John Mitchell University of St Andrews

Thanks• SULSA, WADA, BBSRC, SFC• Dr Lazaros Mavridis, James McDonagh, Dr Tanja van

Mourik, Dr Luna De Ferrari, Neetika Nath (St Andrews)• Prof. Maxim Fedorov, Dr David Palmer (Strathclyde) • Laura Hughes, Dr Toni Llinas (ex-Cambridge)• James Taylor, Simon Hogan, Gregor McInnes, Callum Kirk,

William Walton (U/G project)