Download - NAGA & BEQUIGNON

DOHA NAGA & OLIVIER BÉQUIGNON

Predicting poses and relative affinities of agonists of FXR

THROUGH PARTICIPATION IN D3R GRAND CHALLENGE 2016

16/12/2016 1


What is FXR?➢ FXR: Farnesoid X Receptor ➢ Nuclear hormonal receptor (liver/intestine) ➢ Homeostasis regulator:

❖ Lipid ❖ Bile Acids ❖ Glucose

➢ Therapeutic target of interest

16/12/2016 2

DyslipidemiaColorectal cancerDiabetes

Ding, L., Yang, L., Wang, Z. & Huang, W. Bile acid nuclear receptor FXR and digestive system diseases. Acta Pharmaceutica Sinica B 5, 135–144 (2015).

cytosolextracellular space

nucleus

complex stabilisation

SRC-coactivator

diseases

translocation & heterodimerisatio

n

DBD

*RXR: Retinoid X Receptor

DBD

RXR*

DBD

DNA transactivation

Cholesterol biliary excretionBile acid synthesisHepatoprotectionGlycolysis glucogenogenesis

DNA


Problematic

Ligand-based approach Construction of dataset Statistical Analysis QSAR Affinity Prediction

Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring

16/12/2016 3

crystallography campaign

FXR project

Predict binding modes and relative affinities of agonists of FXR

➢ set of 36 compounds

affinities

binding modes & affinities


Ligand-based approach Construction of dataset Statistical Analysis QSAR Affinity Prediction

LIGAND-BASED APPROACH

16/12/2016

DOHA NAGA & OLIVIER BÉQUIGNON16/12/2016 5

Ligand-based approach • Construction of

dataset • Statistical Analysis • QSAR • Affinity Prediction

➢ Only available datasets: EC50

➢ Two datasets: ❖ FRET-only dataset ❖ General dataset

Normalisation difficult/impossible

149 compounds857 compounds

➢ Lots of difficulties selecting compounds (half of time)

➢ EC50 potency not correlated to affinity

➢ Variability of measures impact datasets quality

Clustered according FragFp on Murcko scaffold




16/12/2016 6

➢ Dataset comprised 1,103 descriptors ❖ 0D (physico-chemical) ❖ 1D (list of fragments) ❖ 2D (molecular paths)

General dataset

FRET-only dataset

➢ EC50 from 2.5 nM to 25.5 µM ❖ Normalisation by log ❖ pEC50 from -12.45 to -0,92


➢ MLR most relevant descriptors ❖ Atom types ❖ Molecular Weight ❖ Molecular paths


dataset • Statistical Analysis • QSAR

16/12/2016 7

Datasets (FRET, General)

null descriptors removal null variance descriptors removal

Multiple Linear Regression (no overfitting, dimension reduction)

Correlated descriptors removal

Descriptors selectionModel validation

Prediction of affinity of 36 compounds

Multiple Linear Regression

Model Validation & SelectionModel

selection

Partial Least Squares Regression Random Forest

➢ Selected descriptors ❖ 64 physico-chemical ❖ 14 walk counts



dataset • Statistical Analysis • QSAR

16/12/2016 8

R2 values GD FOD

Model Training set Test set Training set Test set

MLR 0,97 0,19 0,75 0,16

RF 0,93 0,62 0,97 0,48

PLSR 0,93 0,73 0,73 0,68➢ Selected descriptors

❖ 64 physico-chemical ❖ 14 walk counts

➢ Affinity prediction ❖ Weak predictability with MLR ❖ Overfitting with RF ❖ Better prediction with PLSR

➢ MLR most relevant descriptors ❖ Atom types ❖ Molecular Weight ❖ Molecular paths




16/12/2016 9

➢ Determination of pEC50 values for 36 compounds

➢ Model might not be reliable

✗ Only 60 % accuracy on test set

✗ Lack of benzoimidazole compounds in learning set

✗ High similarity between (4/7 series)

✗ Inaccuracy of EC50 measures

✗ Some series may bind to other binding site

✓ May benefit from affinity prediction based on docking

Multidimensional scaling on FOD. Black filled circles represent the learning and test sets, red filled circles the D3R set.


STRUCTURE-BASED APPROACH

16/12/2016



Chemotype PDB ID Scaffold Chemical structure

Steroid 1OSV,1OT7,3BEJ

GW4604 and derivatives

3DCU,3DCUT,3FXV,..

Benzoimidazoles 3OKH,3OKI,3OLF,.. 24

1

5

Number of compounds


16/12/2016 11

➢ 27 structures in the PDB

➢ Selection of structure: ❖ Best resolution ❖ No missing residues ❖ Most studied ❖ Representative of dataset

compound families Resolution

3DCU 2.95 Å3OKH 2.5 Å

3DCT 2.5 Å1OSV 2.5 Å

3FXV4OIV

3L1B1OSH

4WVD

2.26 Å1.7 Å1.8 Å1.9 Å2.9 Å

➢ Missing residues or mutations in Binding Site



16/12/2016 12

Structure Preparation(DockPrep, Chimera)

• Addition of missing side chains • Protonation of amino acids (pH= 7.4)

System Preparation(Gromacs)

• Force field = CHARMM 27 force field plus CMAP for proteins ,ALL ATOM Force Field

• solvation = Water type TIP3P • Triclinic box, distance = 10 Å • Neutralziation of system > 20 Na+

Minimization

• 2 minimizations (before and after ions) • Integrator: Steepest descent • Number of steps: 1,000 • Fmax = 1000 kJ/mol/nm

Equilibration Phase

• Simulation time 500 ps • Isothermic-Isobaric system• Temperature & Pressure coupling

(Berendsen thermostat/barostat ), 300 K, 1 bar

Production Phase

• Same as NPT parameters • Simulation time 20 ns• Positions of atoms written every 1 ps • Electrostatics: PME (Particle Mesh Ewald)

➢ Why MD?

❖ Assess stability

❖ Investigate binding site movements

➢ Structure

❖ Conformation with Nuclear peptide coactivator



13

50

70

60

80

90

100

4 8 12 16 200Time (ns)

Perc

enta

ge o

f nat

ive

cont

acts

➢ Study on native contacts percentage ❖ Around 90% of contacts

preserved

0 4 8 12 16 20

Time (ns)

➢ 12 Helices with little variations

➢ Same number of amino acids constituting 2ndry structures

16/12/2016

2040

2001801601401201008060

220



1416/12/2016

Time (ns)10 20155

0 ns 20 ns

➢ Some changes in C-α backbone

➢ Binding site not included in C-α changes



1516/12/2016

Snapshot 0 ns (gate opened) Snapshot 20 ns (Gate closed)

➢ Globally stable (no major changes)

➢ Binding site (movement of side chains)➢ Open conformation kept for

docking



16/12/2016 16

Conformer/tautomer generation

3OKH

Score histograms (all poses)

Screening with AutoDock Vina Structure validation

Docking withAutodock 4.2Binding mode

prediction

1OSV3DCU

3OKH 1OSV3DCU

Pose clustering

Identification of best poses

Experimental poses

Redocking withAutodock 4.2

Pose clustering

Identification of experimental posesResult

evaluation

Receptor Preparation(DockPrep, Chimera)

• Removal of water molecules • Polar hydrogen addition (pH = 7.4)

Compounds Preparation(DockPrep, Chimera) • Polar hydrogen addition (pH = 7.4)

Compounds Preparation(AutoDockTools 1.5.6) • Addition of Gasteiger charges

Screeing AutoDock Vina• Rigid receptor • Box spacing = 1.0 Å• 10 poses generation

Docking AutoDock 4.2

• Rigid receptor • Box spacing = 0.375 Å• Lamarckian Genetic Algorithm (250

runs)

Redocking AutoDock 4.2

• Rigid receptor • Box spacing = 0.210 Å• Lamarckian Genetic Algorithm (500

runs, initial population at each run 500)

36 compounds

36 poses

➢ Energy scores between -14 and -9 kcal/mol

➢ High affinity binding

(kcal/mol)



16/12/2016 17

➢ Top poses interactions: ❖ 3 Hydrogen bonds

(SER 336, TYR 373, ARG 335) ❖ Pi stacking (PHE 333) ❖ Proper orientations of hydrophobics

(ILE 361 & 356, LEU 291 & 290)

E = - 14,3 kcal/molE = - 13,9 kcal/molE = - 12,2 kcal/mol

➢ Gives an idea on essential features of ligand that can fit receptor

➢ Development of pharmacophore for best docking poses


RESULTS VALIDATION & CONCLUSION

16/12/2016 18


Results Validation

16/12/2016 19

RMSD = 0,99 Å Kipred = 8,59 pM

RMSD = 2,82 Å Kipred = 33,67 pMRMSD = 4,95 Å

Kipred = 11,63 pM

FXR_27 FXR_18 FXR_13


Results Validation

16/12/2016 20

➢ Low RMSD ➢ Slightly higher RMSD Preserved interactions

➢ Changes in rings positional interactions

RMSD = 0,99 Å RMSD = 2,82 Å RMSD = 4,95 Å

FXR_27 FXR_18 FXR_13


Conclusion

16/12/2016 21

➢ QSAR model not aplicable: ❖ EC50 variability in measurements ❖ Chemical space not reprentative of 2/3 of

compounds ➢ MD allows for identification of open conformation

state

➢ Docking grants better ranking

❖ Identification of really high affinity compounds


Conclusion

16/12/2016 22

➢ Hydrophobicity/Hydrophilicity

➢ Aromaticity

➢ Number of hetero atoms

➢ QSAR model not aplicable: ❖ EC50 variability in measurements ❖ Chemical space not reprentative of 2/3 of

compounds ➢ MD allows for identification of open conformation

state

➢ Docking grants better ranking

❖ Identification of really high affinity compounds


Improvements

16/12/2016 23

➢ QSAR: ❖ Random sampling of descriptors ❖ Different models for consensual approach ❖ 3D QSAR (CoMFA*, CoMSIA)

➢ MD simulation of receptor-docked ligand complex*

➢ Docking:

❖ Lower spacing, higher LGA population ❖ Minimization of protein-docked ligand complex ❖ Use of more precise scoring functions ❖ Study of water molecules network

*Martínez et al., Molecular dynamics simulations reveal multiple pathways of ligand dissociation from thyroid hormone receptors. Biophysical Journal (2005), 89(3), 2011–23.


REFERENCES

Wang, Y. D., Chen, W. D. & Huang, W. FXR, a target for different diseases. Histol. Histopathol. 23, 621–627 (2008).

Stanton, D. T. (2012). QSAR and QSPR model interpretation using partial least squares (PLS) analysis. Current Computer-Aided Drug Design, 8, 107–127. https://doi.org/10.2174/157340912800492357Kubinyi, H. (1997).

QSAR and 3D QSAR in drug design. Part 1: Methodology. Drug Discovery Today.

Costantino, G., Entrena-Guadix, A., Macchiarulo, A., Gioiello, A. & Pellicciari, R. Molecular dynamics simulation of the ligand binding domain of farnesoid X receptor. Insights into helix-12 stability and coactivator peptide stabilization in response to agonist binding. J. Med. Chem. 48, 3251–3259 (2005).

Martínez, L., Sonoda, M. T., Webb, P., Baxter, J. D., Skaf, M. S., & Polikarpov, I., Molecular dynamics simulations reveal multiple pathways of ligand dissociation from thyroid hormone receptors. Biophysical Journal (2005), 89(3), 2011–23.

16/12/2016 24

DOHA NAGA & OLIVIER BÉQUIGNON16/12/2016

THANK YOUAKNOWLEDGEMENTS

Gautier MOROY & Manon REAU For their guidance, support and precious help!!!

Dhoha TRIKI & Natacha CERISIER For the continuous and wise counseling.


APPENDICES

16/12/2016


Why this name?➢ Identified in 1995

➢ Known to date as an ‘orphan’ receptor (associated with no ligands)

➢ Found to interact with farnesol and derivatives

16/12/2016 27

synthesis of cholesterol, bile acids, steroids, retinoids, and farnesylated proteins

Forman, B. M. et al. Identification of a nuclear receptor that is activated by farnesol metabolites. Cell 81, 687–693 (1995).

Cholic acidCholesterol Lithocholic acid Hydrocortisone Isotretinoin


FragFp fingerprint➢ Similar to MDL keys

➢ Based on predefined dataset of 512 structure fragments

➢ 1 bit for presence of each fragment (0 otherwise)

➢ Heteroatoms replacements with wildcards ❖ single atoms replacement cause low drop of similarity

16/12/2016 28

✓ All occur in within typical organic molecule structures ✓ Only little overlap between types of fragments


Selected descriptors (1/2)

16/12/2016 29

AECCPhysico-chemical average eccentricity

AMWPhysico-chemical average molecular weight

ARRPhysico-chemical aromatic ratio

CENTPhysico-chemical centralization

CSIPhysico-chemical eccentric connectivity index

D/DPhysico-chemical distance/detour index

D/Dr03Physico-chemical

distance/detour ring index of order 3

DzPhysico-chemical Pogliani index

ECCPhysico-chemical eccentricity

GMTIPhysico-chemical

Gutman Molecular Topological Index

GMTIVPhysico-chemical

Gutman MTI by valence vertex degrees

HarPhysico-chemical Harary H index

Har2Physico-chemical

square reciprocal distance sum index

HyDpPhysico-chemical hyper-distance-path index

ICRPhysico-chemical radial centric information index

JhetmPhysico-chemical

Balaban-type index from mass weighted distance matrix

nBMPhysico-chemical number of multiple bonds

nBTPhysico-chemical number of bonds

nSKPhysico-chemical number of non-H atoms

PHIPhysico-chemical Kier flexibility index

PolPhysico-chemical polarity number

QWPhysico-chemical

quasi-Wiener index (Kirchhoff number)

RBFPhysico-chemical rotatable bond fraction

RHyDpPhysico-chemical

reciprocal hyper-distance-path index

RwwPhysico-chemical reciprocal hyper-detour index

S0KPhysico-chemical Kier symmetry index

S1KPhysico-chemical

1-path Kier alpha-modified shape index

S2KPhysico-chemical


S3KPhysico-chemical


SCBOPhysico-chemical

sum of conventional bond orders (H-depleted)

SMTIPhysico-chemical

Schultz Molecular Topological Index (MTI)

SMTIVPhysico-chemical

Schultz MTI by valence vertex degrees

T(O..O)Physico-chemical

sum of topological distances between O..O

T(O..S)Physico-chemical

sum of topological distances between O..S

TI1Physico-chemical first Mohar index TI1

UNIPPhysico-chemical unipolarity

VARPhysico-chemical variation

VDAPhysico-chemical average vertex distance degree

WPhysico-chemical Wiener W index

WAPhysico-chemical mean Wiener index

WhetePhysico-chemical

Wiener-type index from electronegativity weighted distance matrix

WhetmPhysico-chemical

Wiener-type index from mass weighted distance matrix

WhetpPhysico-chemical

Wiener-type index from polarizability weighted distance matrix

WhetvPhysico-chemical

Wiener-type index from van der Waals weighted distance matrix

WhetZPhysico-chemical

Wiener-type index from Z weighted distance matrix (Barysz matrix)

wwPhysico-chemical hyper-detour index

XtPhysico-chemical Total structure connectivity index


Selected descriptors (2/2)

16/12/2016 30

MPC04Walk and path counts

molecular path count of order 04







MWC02Walk and path counts

molecular walk count of order 02











SRW08Walk and path counts

self-returning walk count of order 08





TWCWalk and path counts total walk count


MLR

16/12/2016 31

BA

Figure 5: Plots of predicted against experimental pEC50 values obtained by multiple linear regression (MLR) on (A) GD and (B) FOD. Each dataset was split evenly in 2/3 for training (blue filled triangles) and 1/3 for test (red filled circles). R² values were computed for both learning and testing with GD (respectively 0.75 and 0.16) and FOD (respectively 0.97 and 0.19).


PLSR

16/12/2016 32

Figure 7: RMSEP and R² values as a function of the number of components used to build the PLS mode on FOD.A threshold of 30 components was used in the PLS model since the associated RMSEP value (~1.85) is quite low and determination coefficient of the model R2 value was much better than for MLR and logistic regression (~0.83).


Equations MD

• X is a conformation,• r_{ij}(X) is the distance between atoms i and j in conformation X,• r^0_{ij} is the distance from heavy atom i to j in the native state

conformation,• S is the set of all pairs of heavy atoms (i,j) belonging to residues• beta= x212B}^{-1},• lambda=1.8 for all-atom simulations

RMSD RADIUS OF GYRATION RMSF

NATIVE CONTACTS

16/12/2016 33


3DCU MD (1/2)RMSD 1.2 Å RMSD 1.01 Å RMSD 1.2 Å

3DCU Apo and Holo forms

➢ 3DCU Apo & Holo forms

16/12/2016 34


3DCU MD (2/2)

90% of Native contacts reserved for both forms

➢ 3DCU Apo and Holo forms

16/12/2016 35


Virtual screening histograms

1OSV 3DCU3OKH

16/12/2016 36


Lamarckian Genetic Algorithm➢ Combines mapping functions

❖ Genotype to Phenotype ❖ Phenotype to Genotype

➢Genotype: ❖ Ligand coordinates ❖ Goodness of fit (energy

evaluation)

➢Phenotype: ❖ Ligand translation

❖ Ligand orientation (quaternion) ❖ Ligand torsion value

16/12/2016 37


Comparison between Vina and AutoDock

➢ Overestimation of low values by Vina (-10 kcal/mol ~ 2-1 nM)

➢ For D3R ❖ Use of Autodock poses rather

than Vina ❖ Use of predicted score to weight

affinity ranking

16/12/2016 38


PCA (GD)

16/12/2016 39

Appendice 32 Principal component analysis performed on the whole learning dataset (GD), comprising 857 compounds and the whole set of 1079 descriptors, shows a variance explanation of 46.60 % with two first dimensions (respectively 31.84 % and 14.78 %). (A) Individual map of the compounds shows the self-organisation of the dataset into 3 distinct clusters (blue, purple and green circles) and iedntifies one outlier (yellow circle). (B) Variable map onto circle of correlation shows a dense repartition along the positive part of the first dimension and a much less dense region with less correlated descriptors along the negative part of the first dimension.

A B

Appendice 4: Principal component analysis performed on the whole learning dataset (GD), comprising 857 compounds and the 290 selected descriptors, shows a variance explanation of 68.78 % with two first dimensions (respectively 49.71 % and 19.07 %). (A) Individual map of the compounds shows the self-organisation of the dataset into 3 distinct clusters (blue, purple and green circles) and identifies one outlier (yellow circle). (B) Variable map onto circle of correlation shows a dense repartition along the positive parts of the first and second dimensions and a much less dense region with less correlated descriptors along the negative part of the first dimension.

A B


PCA (FOD)

16/12/2016 40

Appendice 1: Principal component analysis performed on the FRET-only dataset (FOD), comprising 149 compounds and the whole set of 853 descriptors, shows a variability explanation of 60.96 % with two first dimensions (respectively 45.57 % and 15.39 %). (A) Individual map of the compounds shows the self-organisation of the dataset into 5 distinct clusters (coloured circles). (B) Variable map onto circle of correlation shows a dense repartition along the positive part of the second dimension and a much less dense region with less correlated descriptors along the negative part of the second dimension.

A B A B

Figure 6: Principal component analysis performed on the FRET-only dataset (FOD), comprising 149 compounds and the 78 selected descriptors, shows a variance explanation of 82.86 % with two first dimensions (respectively 65.07 % and 17.79 %). (A) Individual map of the compounds shows the self-organisation of the dataset into 5 distinct clusters (coloured circles). (B) Variable map onto circle of correlation shows the importance of descriptors encoding molecular paths (MPC04-07, MWC07-10) and polarity (pol) and molecular weight (AMW).

Download - NAGA & BEQUIGNON

Top Related