Download - NAGA & BEQUIGNON
DOHA NAGA & OLIVIER BÉQUIGNON
Predicting poses and relative affinities of agonists of FXR
THROUGH PARTICIPATION IN D3R GRAND CHALLENGE 2016
16/12/2016 1
DOHA NAGA & OLIVIER BÉQUIGNON
What is FXR?➢ FXR: Farnesoid X Receptor ➢ Nuclear hormonal receptor (liver/intestine) ➢ Homeostasis regulator:
❖ Lipid ❖ Bile Acids ❖ Glucose
➢ Therapeutic target of interest
16/12/2016 2
DyslipidemiaColorectal cancerDiabetes
Ding, L., Yang, L., Wang, Z. & Huang, W. Bile acid nuclear receptor FXR and digestive system diseases. Acta Pharmaceutica Sinica B 5, 135–144 (2015).
cytosolextracellular space
nucleus
complex stabilisation
SRC-coactivator
diseases
translocation & heterodimerisatio
n
DBD
*RXR: Retinoid X Receptor
DBD
RXR*
DBD
DNA transactivation
Cholesterol biliary excretionBile acid synthesisHepatoprotectionGlycolysis glucogenogenesis
DNA
DOHA NAGA & OLIVIER BÉQUIGNON
Problematic
Ligand-based approach Construction of dataset Statistical Analysis QSAR Affinity Prediction
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
16/12/2016 3
crystallography campaign
FXR project
Predict binding modes and relative affinities of agonists of FXR
➢ set of 36 compounds
affinities
binding modes & affinities
DOHA NAGA & OLIVIER BÉQUIGNON
Ligand-based approach Construction of dataset Statistical Analysis QSAR Affinity Prediction
LIGAND-BASED APPROACH
16/12/2016
DOHA NAGA & OLIVIER BÉQUIGNON16/12/2016 5
Ligand-based approach • Construction of
dataset • Statistical Analysis • QSAR • Affinity Prediction
➢ Only available datasets: EC50
➢ Two datasets: ❖ FRET-only dataset ❖ General dataset
Normalisation difficult/impossible
149 compounds857 compounds
➢ Lots of difficulties selecting compounds (half of time)
➢ EC50 potency not correlated to affinity
➢ Variability of measures impact datasets quality
Clustered according FragFp on Murcko scaffold
DOHA NAGA & OLIVIER BÉQUIGNON
Ligand-based approach • Construction of
dataset • Statistical Analysis • QSAR • Affinity Prediction
16/12/2016 6
➢ Dataset comprised 1,103 descriptors ❖ 0D (physico-chemical) ❖ 1D (list of fragments) ❖ 2D (molecular paths)
General dataset
FRET-only dataset
➢ EC50 from 2.5 nM to 25.5 µM ❖ Normalisation by log ❖ pEC50 from -12.45 to -0,92
DOHA NAGA & OLIVIER BÉQUIGNON
➢ MLR most relevant descriptors ❖ Atom types ❖ Molecular Weight ❖ Molecular paths
Ligand-based approach • Construction of
dataset • Statistical Analysis • QSAR
16/12/2016 7
Datasets (FRET, General)
null descriptors removal null variance descriptors removal
Multiple Linear Regression (no overfitting, dimension reduction)
Correlated descriptors removal
Descriptors selectionModel validation
Prediction of affinity of 36 compounds
Multiple Linear Regression
Model Validation & SelectionModel
selection
Partial Least Squares Regression Random Forest
➢ Selected descriptors ❖ 64 physico-chemical ❖ 14 walk counts
DOHA NAGA & OLIVIER BÉQUIGNON
Ligand-based approach • Construction of
dataset • Statistical Analysis • QSAR
16/12/2016 8
R2 values GD FOD
Model Training set Test set Training set Test set
MLR 0,97 0,19 0,75 0,16
RF 0,93 0,62 0,97 0,48
PLSR 0,93 0,73 0,73 0,68➢ Selected descriptors
❖ 64 physico-chemical ❖ 14 walk counts
➢ Affinity prediction ❖ Weak predictability with MLR ❖ Overfitting with RF ❖ Better prediction with PLSR
➢ MLR most relevant descriptors ❖ Atom types ❖ Molecular Weight ❖ Molecular paths
DOHA NAGA & OLIVIER BÉQUIGNON
Ligand-based approach • Construction of
dataset • Statistical Analysis • QSAR • Affinity Prediction
16/12/2016 9
➢ Determination of pEC50 values for 36 compounds
➢ Model might not be reliable
✗ Only 60 % accuracy on test set
✗ Lack of benzoimidazole compounds in learning set
✗ High similarity between (4/7 series)
✗ Inaccuracy of EC50 measures
✗ Some series may bind to other binding site
✓ May benefit from affinity prediction based on docking
Multidimensional scaling on FOD. Black filled circles represent the learning and test sets, red filled circles the D3R set.
DOHA NAGA & OLIVIER BÉQUIGNON
STRUCTURE-BASED APPROACH
16/12/2016
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
DOHA NAGA & OLIVIER BÉQUIGNON
Chemotype PDB ID Scaffold Chemical structure
Steroid 1OSV,1OT7,3BEJ
GW4604 and derivatives
3DCU,3DCUT,3FXV,..
Benzoimidazoles 3OKH,3OKI,3OLF,.. 24
1
5
Number of compounds
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
16/12/2016 11
➢ 27 structures in the PDB
➢ Selection of structure: ❖ Best resolution ❖ No missing residues ❖ Most studied ❖ Representative of dataset
compound families Resolution
3DCU 2.95 Å3OKH 2.5 Å
3DCT 2.5 Å1OSV 2.5 Å
3FXV4OIV
3L1B1OSH
4WVD
2.26 Å1.7 Å1.8 Å1.9 Å2.9 Å
➢ Missing residues or mutations in Binding Site
DOHA NAGA & OLIVIER BÉQUIGNON
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
16/12/2016 12
Structure Preparation(DockPrep, Chimera)
• Addition of missing side chains • Protonation of amino acids (pH= 7.4)
System Preparation(Gromacs)
• Force field = CHARMM 27 force field plus CMAP for proteins ,ALL ATOM Force Field
• solvation = Water type TIP3P • Triclinic box, distance = 10 Å • Neutralziation of system > 20 Na+
Minimization
• 2 minimizations (before and after ions) • Integrator: Steepest descent • Number of steps: 1,000 • Fmax = 1000 kJ/mol/nm
Equilibration Phase
• Simulation time 500 ps • Isothermic-Isobaric system• Temperature & Pressure coupling
(Berendsen thermostat/barostat ), 300 K, 1 bar
Production Phase
• Same as NPT parameters • Simulation time 20 ns• Positions of atoms written every 1 ps • Electrostatics: PME (Particle Mesh Ewald)
➢ Why MD?
❖ Assess stability
❖ Investigate binding site movements
➢ Structure
❖ Conformation with Nuclear peptide coactivator
DOHA NAGA & OLIVIER BÉQUIGNON
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
13
50
70
60
80
90
100
4 8 12 16 200Time (ns)
Perc
enta
ge o
f nat
ive
cont
acts
➢ Study on native contacts percentage ❖ Around 90% of contacts
preserved
0 4 8 12 16 20
Time (ns)
➢ 12 Helices with little variations
➢ Same number of amino acids constituting 2ndry structures
16/12/2016
2040
2001801601401201008060
220
DOHA NAGA & OLIVIER BÉQUIGNON
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
1416/12/2016
Time (ns)10 20155
0 ns 20 ns
➢ Some changes in C-α backbone
➢ Binding site not included in C-α changes
DOHA NAGA & OLIVIER BÉQUIGNON
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
1516/12/2016
Snapshot 0 ns (gate opened) Snapshot 20 ns (Gate closed)
➢ Globally stable (no major changes)
➢ Binding site (movement of side chains)➢ Open conformation kept for
docking
DOHA NAGA & OLIVIER BÉQUIGNON
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
16/12/2016 16
Conformer/tautomer generation
3OKH
Score histograms (all poses)
Screening with AutoDock Vina Structure validation
Docking withAutodock 4.2Binding mode
prediction
1OSV3DCU
3OKH 1OSV3DCU
Pose clustering
Identification of best poses
Experimental poses
Redocking withAutodock 4.2
Pose clustering
Identification of experimental posesResult
evaluation
Receptor Preparation(DockPrep, Chimera)
• Removal of water molecules • Polar hydrogen addition (pH = 7.4)
Compounds Preparation(DockPrep, Chimera) • Polar hydrogen addition (pH = 7.4)
Compounds Preparation(AutoDockTools 1.5.6) • Addition of Gasteiger charges
Screeing AutoDock Vina• Rigid receptor • Box spacing = 1.0 Å• 10 poses generation
Docking AutoDock 4.2
• Rigid receptor • Box spacing = 0.375 Å• Lamarckian Genetic Algorithm (250
runs)
Redocking AutoDock 4.2
• Rigid receptor • Box spacing = 0.210 Å• Lamarckian Genetic Algorithm (500
runs, initial population at each run 500)
36 compounds
36 poses
➢ Energy scores between -14 and -9 kcal/mol
➢ High affinity binding
(kcal/mol)
DOHA NAGA & OLIVIER BÉQUIGNON
Structure-based approach Protein structure selection Molecular Dynamics Docking Scoring
16/12/2016 17
➢ Top poses interactions: ❖ 3 Hydrogen bonds
(SER 336, TYR 373, ARG 335) ❖ Pi stacking (PHE 333) ❖ Proper orientations of hydrophobics
(ILE 361 & 356, LEU 291 & 290)
E = - 14,3 kcal/molE = - 13,9 kcal/molE = - 12,2 kcal/mol
➢ Gives an idea on essential features of ligand that can fit receptor
➢ Development of pharmacophore for best docking poses
DOHA NAGA & OLIVIER BÉQUIGNON
RESULTS VALIDATION & CONCLUSION
16/12/2016 18
DOHA NAGA & OLIVIER BÉQUIGNON
Results Validation
16/12/2016 19
RMSD = 0,99 Å Kipred = 8,59 pM
RMSD = 2,82 Å Kipred = 33,67 pMRMSD = 4,95 Å
Kipred = 11,63 pM
FXR_27 FXR_18 FXR_13
DOHA NAGA & OLIVIER BÉQUIGNON
Results Validation
16/12/2016 20
➢ Low RMSD ➢ Slightly higher RMSD Preserved interactions
➢ Changes in rings positional interactions
RMSD = 0,99 Å RMSD = 2,82 Å RMSD = 4,95 Å
FXR_27 FXR_18 FXR_13
DOHA NAGA & OLIVIER BÉQUIGNON
Conclusion
16/12/2016 21
➢ QSAR model not aplicable: ❖ EC50 variability in measurements ❖ Chemical space not reprentative of 2/3 of
compounds ➢ MD allows for identification of open conformation
state
➢ Docking grants better ranking
❖ Identification of really high affinity compounds
DOHA NAGA & OLIVIER BÉQUIGNON
Conclusion
16/12/2016 22
➢ Hydrophobicity/Hydrophilicity
➢ Aromaticity
➢ Number of hetero atoms
➢ QSAR model not aplicable: ❖ EC50 variability in measurements ❖ Chemical space not reprentative of 2/3 of
compounds ➢ MD allows for identification of open conformation
state
➢ Docking grants better ranking
❖ Identification of really high affinity compounds
DOHA NAGA & OLIVIER BÉQUIGNON
Improvements
16/12/2016 23
➢ QSAR: ❖ Random sampling of descriptors ❖ Different models for consensual approach ❖ 3D QSAR (CoMFA*, CoMSIA)
➢ MD simulation of receptor-docked ligand complex*
➢ Docking:
❖ Lower spacing, higher LGA population ❖ Minimization of protein-docked ligand complex ❖ Use of more precise scoring functions ❖ Study of water molecules network
*Martínez et al., Molecular dynamics simulations reveal multiple pathways of ligand dissociation from thyroid hormone receptors. Biophysical Journal (2005), 89(3), 2011–23.
DOHA NAGA & OLIVIER BÉQUIGNON
REFERENCES
Wang, Y. D., Chen, W. D. & Huang, W. FXR, a target for different diseases. Histol. Histopathol. 23, 621–627 (2008).
Stanton, D. T. (2012). QSAR and QSPR model interpretation using partial least squares (PLS) analysis. Current Computer-Aided Drug Design, 8, 107–127. https://doi.org/10.2174/157340912800492357Kubinyi, H. (1997).
QSAR and 3D QSAR in drug design. Part 1: Methodology. Drug Discovery Today.
Costantino, G., Entrena-Guadix, A., Macchiarulo, A., Gioiello, A. & Pellicciari, R. Molecular dynamics simulation of the ligand binding domain of farnesoid X receptor. Insights into helix-12 stability and coactivator peptide stabilization in response to agonist binding. J. Med. Chem. 48, 3251–3259 (2005).
Martínez, L., Sonoda, M. T., Webb, P., Baxter, J. D., Skaf, M. S., & Polikarpov, I., Molecular dynamics simulations reveal multiple pathways of ligand dissociation from thyroid hormone receptors. Biophysical Journal (2005), 89(3), 2011–23.
16/12/2016 24
DOHA NAGA & OLIVIER BÉQUIGNON16/12/2016
THANK YOUAKNOWLEDGEMENTS
Gautier MOROY & Manon REAU For their guidance, support and precious help!!!
Dhoha TRIKI & Natacha CERISIER For the continuous and wise counseling.
DOHA NAGA & OLIVIER BÉQUIGNON
APPENDICES
16/12/2016
DOHA NAGA & OLIVIER BÉQUIGNON
Why this name?➢ Identified in 1995
➢ Known to date as an ‘orphan’ receptor (associated with no ligands)
➢ Found to interact with farnesol and derivatives
16/12/2016 27
synthesis of cholesterol, bile acids, steroids, retinoids, and farnesylated proteins
Forman, B. M. et al. Identification of a nuclear receptor that is activated by farnesol metabolites. Cell 81, 687–693 (1995).
Cholic acidCholesterol Lithocholic acid Hydrocortisone Isotretinoin
DOHA NAGA & OLIVIER BÉQUIGNON
FragFp fingerprint➢ Similar to MDL keys
➢ Based on predefined dataset of 512 structure fragments
➢ 1 bit for presence of each fragment (0 otherwise)
➢ Heteroatoms replacements with wildcards ❖ single atoms replacement cause low drop of similarity
16/12/2016 28
✓ All occur in within typical organic molecule structures ✓ Only little overlap between types of fragments
DOHA NAGA & OLIVIER BÉQUIGNON
Selected descriptors (1/2)
16/12/2016 29
AECCPhysico-chemical average eccentricity
AMWPhysico-chemical average molecular weight
ARRPhysico-chemical aromatic ratio
CENTPhysico-chemical centralization
CSIPhysico-chemical eccentric connectivity index
D/DPhysico-chemical distance/detour index
D/Dr03Physico-chemical
distance/detour ring index of order 3
DzPhysico-chemical Pogliani index
ECCPhysico-chemical eccentricity
GMTIPhysico-chemical
Gutman Molecular Topological Index
GMTIVPhysico-chemical
Gutman MTI by valence vertex degrees
HarPhysico-chemical Harary H index
Har2Physico-chemical
square reciprocal distance sum index
HyDpPhysico-chemical hyper-distance-path index
ICRPhysico-chemical radial centric information index
JhetmPhysico-chemical
Balaban-type index from mass weighted distance matrix
nBMPhysico-chemical number of multiple bonds
nBTPhysico-chemical number of bonds
nSKPhysico-chemical number of non-H atoms
PHIPhysico-chemical Kier flexibility index
PolPhysico-chemical polarity number
QWPhysico-chemical
quasi-Wiener index (Kirchhoff number)
RBFPhysico-chemical rotatable bond fraction
RHyDpPhysico-chemical
reciprocal hyper-distance-path index
RwwPhysico-chemical reciprocal hyper-detour index
S0KPhysico-chemical Kier symmetry index
S1KPhysico-chemical
1-path Kier alpha-modified shape index
S2KPhysico-chemical
2-path Kier alpha-modified shape index
S3KPhysico-chemical
3-path Kier alpha-modified shape index
SCBOPhysico-chemical
sum of conventional bond orders (H-depleted)
SMTIPhysico-chemical
Schultz Molecular Topological Index (MTI)
SMTIVPhysico-chemical
Schultz MTI by valence vertex degrees
T(O..O)Physico-chemical
sum of topological distances between O..O
T(O..S)Physico-chemical
sum of topological distances between O..S
TI1Physico-chemical first Mohar index TI1
UNIPPhysico-chemical unipolarity
VARPhysico-chemical variation
VDAPhysico-chemical average vertex distance degree
WPhysico-chemical Wiener W index
WAPhysico-chemical mean Wiener index
WhetePhysico-chemical
Wiener-type index from electronegativity weighted distance matrix
WhetmPhysico-chemical
Wiener-type index from mass weighted distance matrix
WhetpPhysico-chemical
Wiener-type index from polarizability weighted distance matrix
WhetvPhysico-chemical
Wiener-type index from van der Waals weighted distance matrix
WhetZPhysico-chemical
Wiener-type index from Z weighted distance matrix (Barysz matrix)
wwPhysico-chemical hyper-detour index
XtPhysico-chemical Total structure connectivity index
DOHA NAGA & OLIVIER BÉQUIGNON
Selected descriptors (2/2)
16/12/2016 30
MPC04Walk and path counts
molecular path count of order 04
MPC05Walk and path counts
molecular path count of order 05
MPC06Walk and path counts
molecular path count of order 06
MPC07Walk and path counts
molecular path count of order 07
MWC02Walk and path counts
molecular walk count of order 02
MWC03Walk and path counts
molecular walk count of order 03
MWC06Walk and path counts
molecular walk count of order 06
MWC07Walk and path counts
molecular walk count of order 07
MWC09Walk and path counts
molecular walk count of order 09
MWC10Walk and path counts
molecular walk count of order 10
SRW08Walk and path counts
self-returning walk count of order 08
SRW09Walk and path counts
self-returning walk count of order 09
SRW10Walk and path counts
self-returning walk count of order 10
TWCWalk and path counts total walk count
DOHA NAGA & OLIVIER BÉQUIGNON
MLR
16/12/2016 31
BA
Figure 5: Plots of predicted against experimental pEC50 values obtained by multiple linear regression (MLR) on (A) GD and (B) FOD. Each dataset was split evenly in 2/3 for training (blue filled triangles) and 1/3 for test (red filled circles). R² values were computed for both learning and testing with GD (respectively 0.75 and 0.16) and FOD (respectively 0.97 and 0.19).
DOHA NAGA & OLIVIER BÉQUIGNON
PLSR
16/12/2016 32
Figure 7: RMSEP and R² values as a function of the number of components used to build the PLS mode on FOD.A threshold of 30 components was used in the PLS model since the associated RMSEP value (~1.85) is quite low and determination coefficient of the model R2 value was much better than for MLR and logistic regression (~0.83).
DOHA NAGA & OLIVIER BÉQUIGNON
Equations MD
• X is a conformation,• r_{ij}(X) is the distance between atoms i and j in conformation X,• r^0_{ij} is the distance from heavy atom i to j in the native state
conformation,• S is the set of all pairs of heavy atoms (i,j) belonging to residues• beta= x212B}^{-1},• lambda=1.8 for all-atom simulations
RMSD RADIUS OF GYRATION RMSF
NATIVE CONTACTS
16/12/2016 33
DOHA NAGA & OLIVIER BÉQUIGNON
3DCU MD (1/2)RMSD 1.2 Å RMSD 1.01 Å RMSD 1.2 Å
3DCU Apo and Holo forms
➢ 3DCU Apo & Holo forms
16/12/2016 34
DOHA NAGA & OLIVIER BÉQUIGNON
3DCU MD (2/2)
90% of Native contacts reserved for both forms
➢ 3DCU Apo and Holo forms
16/12/2016 35
DOHA NAGA & OLIVIER BÉQUIGNON
Virtual screening histograms
1OSV 3DCU3OKH
16/12/2016 36
DOHA NAGA & OLIVIER BÉQUIGNON
Lamarckian Genetic Algorithm➢ Combines mapping functions
❖ Genotype to Phenotype ❖ Phenotype to Genotype
➢Genotype: ❖ Ligand coordinates ❖ Goodness of fit (energy
evaluation)
➢Phenotype: ❖ Ligand translation
❖ Ligand orientation (quaternion) ❖ Ligand torsion value
16/12/2016 37
DOHA NAGA & OLIVIER BÉQUIGNON
Comparison between Vina and AutoDock
➢ Overestimation of low values by Vina (-10 kcal/mol ~ 2-1 nM)
➢ For D3R ❖ Use of Autodock poses rather
than Vina ❖ Use of predicted score to weight
affinity ranking
16/12/2016 38
DOHA NAGA & OLIVIER BÉQUIGNON
PCA (GD)
16/12/2016 39
Appendice 32 Principal component analysis performed on the whole learning dataset (GD), comprising 857 compounds and the whole set of 1079 descriptors, shows a variance explanation of 46.60 % with two first dimensions (respectively 31.84 % and 14.78 %). (A) Individual map of the compounds shows the self-organisation of the dataset into 3 distinct clusters (blue, purple and green circles) and iedntifies one outlier (yellow circle). (B) Variable map onto circle of correlation shows a dense repartition along the positive part of the first dimension and a much less dense region with less correlated descriptors along the negative part of the first dimension.
A B
Appendice 4: Principal component analysis performed on the whole learning dataset (GD), comprising 857 compounds and the 290 selected descriptors, shows a variance explanation of 68.78 % with two first dimensions (respectively 49.71 % and 19.07 %). (A) Individual map of the compounds shows the self-organisation of the dataset into 3 distinct clusters (blue, purple and green circles) and identifies one outlier (yellow circle). (B) Variable map onto circle of correlation shows a dense repartition along the positive parts of the first and second dimensions and a much less dense region with less correlated descriptors along the negative part of the first dimension.
A B
DOHA NAGA & OLIVIER BÉQUIGNON
PCA (FOD)
16/12/2016 40
Appendice 1: Principal component analysis performed on the FRET-only dataset (FOD), comprising 149 compounds and the whole set of 853 descriptors, shows a variability explanation of 60.96 % with two first dimensions (respectively 45.57 % and 15.39 %). (A) Individual map of the compounds shows the self-organisation of the dataset into 5 distinct clusters (coloured circles). (B) Variable map onto circle of correlation shows a dense repartition along the positive part of the second dimension and a much less dense region with less correlated descriptors along the negative part of the second dimension.
A B A B
Figure 6: Principal component analysis performed on the FRET-only dataset (FOD), comprising 149 compounds and the 78 selected descriptors, shows a variance explanation of 82.86 % with two first dimensions (respectively 65.07 % and 17.79 %). (A) Individual map of the compounds shows the self-organisation of the dataset into 5 distinct clusters (coloured circles). (B) Variable map onto circle of correlation shows the importance of descriptors encoding molecular paths (MPC04-07, MWC07-10) and polarity (pol) and molecular weight (AMW).