13.docking scoring
TRANSCRIPT
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 1/47
Copyright © 2004 Eli Lilly and Company
Docking & Scoring
EMBO-Course: “Methods for Protein Simulation & DrugDesign.” Shanghai, China, September 13-24, 2004.
Qi ChenEli Lilly & CompanyIndianapolis, IndianaU.S.A.
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 2/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 2
Outline
Introduction
Docking Methods Representation of receptor binding site and ligand
Sampling of configuration space of the ligand-receptor complex
Scoring Methods Free energy, binding affinity, and docking scores
Scoring functions, consensus scoring, and others
Docking Software Existing software
DOCK, FlexX, GOLD, AutoDock, LUDI, Glide, FRED, CDOCKER
Accuracy, Applications, and Successes
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 3/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 3
What Are Docking & Scoring?
To place a ligand (small molecule) into the binding site of areceptor in the manners appropriate for optimal interactionswith a receptor.
To evaluate the ligand-receptor interactions in a way that maydiscriminate the experimentally observed mode from othersand estimate the binding affinity.
ligand
receptor
complex
docking scoring
… etc
X-ray structure& DG
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 4/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 4
Why Do We Do Docking?
Drug discovery costs are too high: ~$800 millions, 8~14 years,~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon2004)
Drugs interact with their receptors in a highly specific and
complementary manner.
Core of the target-based structure-based drug design (SBDD)for lead generation and optimization.
Lead is a compound that
shows biological activity,
is novel, and
has the potential of being structurally modified for improved bioactivity,selectivity, and drugeability.
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 5/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 5
Three Components of Docking
Representation of receptorbinding site and ligand
pre- and/orduring docking :
Sampling of configuration spaceof the ligand-receptor complex
during docking :
Evaluation of ligand-receptorinteractions
during dockingand scoring :
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 6/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 6
Receptor Structures & Binding SiteDescriptions
PDB (Protein Data Bank, www.rcsb.org/pdb/) containing proteins orenzymes: X-ray crystal: >12,000 structures, 788 have ≤ 1.5 Å, 9,390 between 1.5-2.5 Å
NMR: >450 structures, ensemble accuracy of 0.4-1 Å in the backbone region, 1.5 Å in average side chain position (Billeter 1992; Clore et al. 1993)
(and high quality homology models built from highly similar sequences)
Limitation of experimental structures (Davis et al. 2003): Locations of hydrogen atoms, water molecules, and metal ions
Identities and locations of some heavy atoms (e.g., ~1/6 of N/O of Asn & Gln, andN/C of His incorrectly assigned in PDB; up to 0.5 Å uncertainty in position)
Conformational flexibility of proteins
Binding site descriptions: atomic coordinates, surface,volume, points & distances, bond vectors, grid andvarious properties such as electrostatic potential,hydrophobic moment, polar, nonpolar, atom types, etc.
DOCK
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 7/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 7
Drug, Chemical & Structural Space
Drug-like: MDDR (MDL Drug Data Report) >147,000 entries, CMC (ComprehensiveMedicinal Chemistry) >8,600 entries
Non-drug-like: ACD (Available Chemicals Directory) ~3 million entries
Literatures and databases,Beilstein (>8 million compounds), CAS & SciFinder
CSD (Cambridge Structural Database, www.ccdc.cam.ac.uk): ~3 million X-ray crystalstructures for >264,000 different compounds and >128,00 organic structures
Available compounds
Available without exclusivity: various vendors (& ACD) Available with limited exclusivity: Maybridge, Array, ChemDiv, WuXi Pharma,
ChemExplorer, etc.
Corporate databases: a few millions in large pharma companies
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 8/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 8
3D Structural Information & LigandDescriptions
2D->3D software: CORINA, OMEGA, CONCORD, MM2/3,WIZARD, COBRA. (reviewed by Robertson et al. 2001)
CSD: <0.1 Å for small molecules, but may not be the boundconformation in the receptor
PDB: ligand-bound protein structures ~6000 entries
Atoms associated with inter-atom distances, physical andchemical properties, types, charges, pharmacophore, etc
Flexibility: conformation ensemble, fragment-based
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 9/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 9
Sampling of Configuration Space ofThe Ligand-Receptor Complex
Descriptor-matching: using pattern-recognizing geometric methods tomatch ligand and receptor site descriptors geometric, chemical, pharmacophore properties, such as distance pairs, triplet,
volume, vector, hydrogen-bond, hydrophobic, charged, etc.
Molecular simulation: MD (molecular dynamics), MC (Monte Carlo)
Others: GA (genetic algorithm), similarity, fragment-based
Challenges Complete conformation and configuration space of ligand and receptor complex
are too large.
Conformational flexibility of both ligand and receptor can’t be ignored. Shape-matching: No ‘best’ method and general solutions for describing and
matching molecular shape of irregular objects (Ullman 1976; Salomaa 1991).Shape alone is not sufficient descriptor to identify low-energy conformations of aligand-receptor complex (Jorgensen 1991).
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 10/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 10
Descriptor Matching Methods: DOCK
Distance-compatibility graph in DOCK (Ewing and Kuntz 1997): distancesbetween sphere centers and distances between ligand heavy atoms
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 11/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 11
Descriptor Matching Methods
Distance-compatibility graph in DOCK (Ewing and Kuntz 1997): distancesbetween sphere centers and distances between ligand heavy atoms
Interaction site matching in LUDI (Boehm 1992): HBA<->HBD, HYP<->HYP
Pose clustering and triplet matching in FlexX (Rarey et al. 1996): HBA<->HBD, HYP<->HYP
Shape-matching in FRED (Openeye www.eyesopen.com)
Vector matching in CAVEAT (Lauri and Bartlett 1994)
Steric effects-matching in CLIX (Lawrence and Davis 1992)
Shape chemical complementarity in SANDOCK (Burkhard et al. 1998)
Surface complementarity in LIGIN: (Sobolev et al. 1996)
H-bond matching in ADAM (Mizutani et al. 1994)
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 12/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 12
Fragment-based Methods
Flexibility and/or de novo design
Identification and placement of the base/anchor fragment are very important
Energy optimization (during or post-docking) is important
Examples
Incremental construction in FlexX with triplet matching and pose clustering to maximize thenumber of favorable interactions
Growing and/or joining in LUDI from pre-built fragment and linker libraries and maximize H-bond and hydrophobic interactions
Anchor-based fragment joining in DOCK
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 13/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 13
Molecular Simulation: MD & MC
Two major components: The description of the degrees of freedom
The energy evaluation
The local movement of the atoms is performed
Due to the forces present at each step in MD (Molecular Dynamics) Randomly in MC (Monte Carlo)
Usually time consuming: Search from a starting orientation to low-energy configuration
Several simulations with different starting orientation must be performed to get a
statistically significant result
Grid for energy calculation. Larger steps or multiple starting posesare often used for speed and sampling coverage in MD: Di Nola et al. 1994; Mangoni et al. 1999; Pak & Wang 2000; CDOCKER by Wu et
al. 2003.
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 14/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 14
MC-based Docking
where T is reduced based on a so-called cooling schedule, and grid can be used forenergy calculation.
An advantage of the MC technique compared with gradient-basedmethods (e.g. MD) is that a simple energy function can be usedwhich does not require derivative information, and able to step overenergy barrier.
AutoDOCK (Goodsell & Olson 1990). MCDOCK (Liu & Wang 1999),
PRODOCK (Trosset & Scheraga 1999), ICM (Abagyan et al. 1994). Simulated annealing is used in DockVision (Hart & Read 1992) and
Affinity (Accelrys Inc., San Diego, CA)
Energy minimization is used in QXP (McMartin & Bohacek 1997).
T k A E B E P
B
)()(exp
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 15/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 15
Genetic Algorithm Docking
A fitness function is used to decide which individuals (configurations)survive and produce offspring for the next iteration of optimization.Degrees of freedom are encoded into genes or binary strings.
The collection of genes (chromosome) is assigned a fitness basedon a scoring function. There are three genetic operators: mutation operator randomly changes the value of a gene;
crossover exchanges a set of genes from one parent chromosome to another;
migration moves individual genes from one sub-population to another.
Requires the generation of an initial population where conventionalMC and MD require a single starting structure in their standardimplementation.
GOLD (Jones et al. 1997); AutoDock 3.0 (Morris et al. 1998); DIVALI(Clark & Ajay 1995).
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 16/47September 21, 2004 Copyright © 2004 Eli Lilly and Company 16
Multiple Method Approach
Similarity-guided MD simulated annealing to improve accuracy(Wu & Vieth 2004).
Shape similarity & clustering to speed up conformationalsearch in docking (Makino & Kuntz 1998).
Better input o r constra ins for the exist ing do cking engines
systematic searchconformations
rigid DOCKminimization
MD/SA
(Wang et al. 1999)
initial posesfilters
finer docking final scoring
(FRED, GLIDE, DOCK)
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 17/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 17
Scoring Functions
A fast and simplified estimation of binding energies
DDDDDD
D
S T GGGG
K RT G
ninteractio solv protein solvligand solvcomplex
affinitybinding
///
ln
configurations of the complex
- s c o r
e s
X-ray
structure
?
scores <-> DGbinding
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 18/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 18
Types of Scoring Functions
Force field based: nonbonded interaction terms as the score,sometimes in combination with solvation terms
Empirical: multivariate regression methods to fit coefficients of physicallymotivated structural functions by using a training set of ligand-receptor
complexes with measured binding affinity
Knowledge-based: statistical atom pair potentials derived fromstructural databases as the score
Other: scores and/or filters based on chemical properties,
pharmacophore, contact, shape complementary
Consensus scoring functions approach
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 19/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 19
Force Field Based Scoring Functions
Advantages FF terms are well studied and have some physical basis
Transferable, and fast when used on a pre-computed grid
Disadvantages Only parts of the relevant energies, i.e., potential energies & sometimes
enhanced by solvation or entropy terms
Electrostatics often overestimated, leading to systematic problems inranking complexes
lig
i
rec
j ij
ji
b
ij
ij
a
ij
ij
Dr
r
B
r
A E
1 1
332e.g. AMBER FF in DOCK
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 20/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 20
FF Scoring: Implementations
AMBER FF: DOCK, FLOG, AutoDOCK
CHARMm FF: CDOCK, MC-approach (Caflisch et al. 1997)
Potential Grid: rigid receptor structure upon docking. The grid-based scoreinterpolates from eight surrounding grid points only. 100-fold speed up. Examples:
DOCK, CDOCK, and many other docking programs.
Soften VDW: A soft-core vdw potential is needed for the kinetic accessibility of thebinding site (Vieth et al. 1998). FLOG: 6-9 Lennard-Jones function; GOLD: 4-8 vdw +H-bond, and intraligand energy.
Solvent Effect on Electrostatic: often approximated by rescaling the in vacuo
coulomb interactions by 1/D, where D = 1-80 or = n*r, n = 1-4, r = distance.
Solvation and Entropy Terms: Solvation terms decomposed into nonpolar andelectrostatic contributions (e.g., DOCK):
np solvelec solvnonbond bind E E E E ,,
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 21/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 21
Empirical Scoring Functions
Goals:reproduce the experimental values of binding energies and with its global
minimum directed to the X-ray crystal structure
Advantages: fast & direct estimation of binding affinity
Disadvantages
Only a few complexes with both accurate structures & binding energies knownDiscrepancy in the binding affinities measured from different labs
Heavy dependence on the placement of hydrogen atoms
Heavy dependence of transferability on the training set
No effective penalty term for bad structures
DDD
DDDDDD
DDDDDD
,.
,int _ ,int _
, _ 0
R f cont lipoG
R f aroG R f ionicG
R f Hbondsneutral G N GGG
lipo
aroio
HBrot rot
LUDI & FlexX(Boehm 1994)
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 22/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 22
Empirical Scoring: Implementations
Mostly differ by what training set and how many parameters are used
Cerius2/Insight2000: LUDI, ChemScore, PLP, LigScore
SYBYL: FlexX, F-Score
Hammerhead: 17 parameters for hydrophobic, polar complementary, entropy,solvation. sLOO = 1.0 logK for 34 complexes
VALIDATE: 8 parameters for VDW and Coulomb interactions, surfacecomplementarity, lipophilicity, conformational entropy and enthalpy, lipophilic andhydrophilic complementarity between receptor and ligand surfaces
PRO_LEADS: 5 coefficients for lipophilic, metal-binding, H-bond, and a flexibilitypenalty term. sLOO = 2 kcal/mol for 82 complexes
SCORE (Tao & Lai, 2001); ChemScore (GOLD)
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 23/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 23
Knowledge-based Potentials of MeanForce Scoring Functions (PMF)
Assumptions An observed crystallographic complex represents the optimum placement of the
ligand atoms relative to the receptor atoms
The Boltzmann hypothesis converts the frequencies of finding atom A of the ligandat a distance r from atom B of the receptor into an effective interaction energybetween A and B as a function of r
Advantages
Similar to empirical, but more general (much more distance data than bindingenergy data)
Disadvantages The Boltzmann hypothesis originates from the statistics of a spatially uniform
liquid, while receptor-ligand complex is a two-component non-uniform medium
PMF are typically pair-wise, while the probability to find atoms A and B at adistance r is non-pairwise and depends also on surrounding atoms
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 24/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 24
PMF: Implementations
Verkhivker et al.(1995): 12 atom pairs, 30 complexes (HIV-1 and simianimmunodeficiency virus). Test on 7 other HIV-1 protease complexes
Wallqvist et al. (1995): 38 complexes, 21 atom types (10 C, 5 O, 5 N,1 S). Test on 8 complexes sd=1.5 kcal/mol, and 20 complexes rmsd=1.0 A.
Muegge et al. (1999): 697 complexes, 16 atom types from receptor &34 from ligand, 282 statistically significant PMF interactions. Test on 77diverse compounds: sd=1.8 log Ki. The PMF was combined with a vdw term toaccount for short-range interactions for DOCK4 docking:
DrugScore (Gohlke et al, 2000), FlexX, BLEEP
Di j ijij pred
P G ln
ij
cutoff r r kl
ij r A score PMF ,
_
ij
bulk
ij
seg jcorr Vol Bij
r r f T k r A
_ lnwhere
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 25/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 25
Consensus Scoring and Others
Too many scoring functions, none prevails in terms of predictivity
Combined approach: one scoring function to sample configurationspace, the other(s) to optimize and/or score: 2 docking methods & 13 scoring functions to significantly reduces false positive
rate (Charifson et al. 1999) Postprocessing of docking results with a filter function followed by re-scoring
(Stahl & Bohm 1998)
ADAM, FlexX, Hammerhead
SYBYL Cscore (Tripos) : FlexX, PMF, DOCK energy, GOLD score
C2 (Accelrys) : LigScore2, PLP, PMF, Ludi, Jain
FRED (OpenEye) : ChemScore, PB-SA, ChemGauss, PLP, ScreenScore
DOCK: AMBER FF, PMF, contact scores, ChemScore
Reduce false positives!
f C
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 26/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 26
An Example of Combined Empirical and
Knowledge-based Approach
Procedure1. Knowledge-based potentials
2. Optimize the ligand position with the scoring function
3. Fit the scores to experimental values
4. Re-optimize ligand positions iteratively until the ligand positions and calibratedparameters have finally converged.
Scoring function: 7 atom types (1 C, 4 O, 2 N), cutoff 7 A, 2000 complexes,rmsd<2A, no metal ions, 164 binding energies, sd =2.1 kcal/mol, rmsd=0.49A
Validation: 36 rigid complexes, AlgoDock, FlexX, Gold, Dock, rmsd 0.74-1.68A; 25known binding energies: sd = 2.0 kcal/mol
))(
exp()( ,
,T
r F r P
B A
B A
C r P T r F B A B A ))(log()( ,,
Muryshev et al. 2003
D ji
ji B A r F G,
,, )(
2
,
,
)()(
r
r nr P
B A
B A &
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 27/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 27
Docking Software
DOCK: (Kuntz et al. 1982)DOCK 4.0 (Ewing & Kuntz 1997) AutoDOCK (Goodsell & Olson 1990) AutoDOCK 3.0 (Morris et al. 1998)GOLD (Jones et al. 1997)FlexX: (Rarey et al. 1996)GLIDE: (Friesner et al. 2004) ADAM (Mizutani et al. 1994)CDOCKER (Wu et al. 2003)CombiDOCK (Sun et al. 1998)DIVALI (Clark & Ajay 1995)DockVision (Hart & Read 1992)FLOG (Miller et al. 1994)GEMDOCK (Yang & Chen 2004)Hammerhead (Welch et al. 1996)LIBDOCK (Diller & Merz 2001)MCDOCK (Liu & Wang 1999)PRO_LEADS (Baxter et al. 1998)
SDOCKER (Wu et al. 2004)QXP (McMartin & Bohacek 1997)Validate (Head et al. 1996)
de novo design toolsLUDI (Boehm 1992),
BUILDER (Roe & Kuntz 1995)SMOG (DeWitte et al. 1997)CONCEPTS (Pearlman & Murcko 1996)DLD/MCSS (Stultz & Karplus 2000)Genstar (Rotstein & Murcko 1993)Group-Build (Rotstein & Murcko 1993)Grow (Moon & Howe 1991)HOOK (Eisen et al. 1994)Legend (Nishibata & Itai 1993)MCDNLG (Gehlhaar et al. 1995)SPROUT (Gillet et al. 1993)
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 28/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 28
Docking Software: Important Factors
Sensitivity on and transferability of the parameters, including thestarting conformation
Adaptability to additional scoring functions, pre- and/or post- dockingprocessing and filters
Ability for iteratively refining docking parameter/protocol based onnew results
Design, components, and results of validation studies
Speed, user interface & control, I/O, structural file formats
User learning curve, customer supports, and cost
Code availability and upgrading possibility
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 29/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 29
DOCK (Kuntz, UCSF)
Receptor Structure• X-ray crystal
• NMR
• homology
Binding Site
Molecular Surface
of Binding Site
Spheres describing the
shape of binding site and
favorable locations of
potential ligand atoms
Matching heavy atoms of
ligands to centers of
spheres to generate thousands
of binding orientations
Scoring Orientations
1. Energy scoring (vdw and electrostatic)
2. Contact scoring (shape complementarity)
3. Chemical scoring
4. Solvation terms
Virtual Screening for
MTS/HTS and Library
Design: ligands in the order
of their best scores
Binding Mode Analysis for
Lead Optimization: binding
orientations and scores for each
ligands
Ligands• 3D structure
• atomic charges
• potentials
• labeling
Filters
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 30/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 30
DOCK: Conformational Flexibility
Torsion-drive and anchor-based options (DOCK4.0)
GA to generate ligand conformations inside the binding site (Oshiroet al. 1995)
A ligand anchor fragment is selected and placed in the receptor,followed by rigid body simplex minimization (Makino & Kuntz 1997)
Ensembles: ~300 conformations are created with the rigid partsuperimposed. DOCK applied to the rigid part and all conformationwere tested for overlap and scored. (Lorber & Shoichet 1998)
Multiple random ligand conformations (Ewing et al. 2001)
Ensemble of protein structures (Knegtel et al. 1997)
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 31/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 31
FlexX (Tripos/SYBYL)
Fragment-based, descriptor matching, empirical scoring (Rarey et al.1996)
Procedures: Select a small set of base fragment suitable for placement using a simple scoring
function.
Place base fragments with the pose clustering algorithm: rigid, triplet matching ofH-bond & hydrophobic interactions, Bohm's scoring function
Build up the remainder of the ligand incrementally from other fragments
Ligand conformations MIMUMBA model with CSD derived low energy torsional angles for each rotatable
bond and ring from CORINA. Multiple conformations for each fragment in the ligand building steps
Other works: Explicit waters are placed into binding site during the dockingprocedure using pre-computed water positions(Rarey et al. 1999). Receptor flexibilityusing discrete alternative protein conformations (Claussen et al. 2001; Claussen &Hindle 2003)
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 32/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 32
GOLD
GA method, H-bond matching, FF scoring (Jones et al. 1997) A configuration is represented by two bit strings:
1. The conformation of the ligand and the protein defined by the torsions;
2. A mapping between H-bond partners in the protein and the ligand.
For fitness evaluation, a 3D structure is created from the chromosomerepresentation. The H-bond atoms are then superimposed to H-bond site pointsin the receptor site.
Fitness (scoring) function: H-bond, the ligand internal energy, the protein-ligandvan der Waals energy
Rotational flexibility for selected receptor hydrogens along with full ligandflexibility
Highlights: Validation test set: 100 complexes, 66 with rmsd<2A.
The structure generation is biased towards inter-molecular H-bonds.
Hydrophobic fitting points was added (GOLD 1.2, CCDC, Cambridge, UK 2001).
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 33/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 33
AutoDock & AutoDock 3.0
Early implementation: MC simulated annealing, AMBER FF-basedenergy grid, flexible ligands (Goodsell & Olson 1990)
AutoDock 3.0: GA as a global optimizer combined with energyminimization as a local search method, flexible ligand, rigid protein as
represented in a grid (Morris et al. 1998)
The fitness function: a Lennard-Jones 12-6 dispersion/repulsion term
a directional 12-10 hydrogen bond term
a coulombic electrostatic potential
a term proportional to the number of sp3 bonds in the ligand to representunfavorable entropy of ligand binding
a desolvation term
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 34/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 34
LUDI: Matching polar and hydrophobic groups
Calculate protein and ligand interaction sites (H-bond orhydrophobic), which are defined by centers and surface, from non-bonded contact distributions based on a search through the CSD,
a set of geometric rules,
the output from the program GRID (Goodford 1985) which calculates bindingenergies for a given probe with a receptor molecule.
Fit fragments onto the interaction sites. distance between interaction sites on the receptor
an RMSD superposition algorithm,
A hashing scheme to access and match surface triangles onto a triangle query ofa ligand interaction center.
A list-merging algorithm creates all triangles based on lists of fitting triangle edgesfor two of the three query triangle edges.
Join/grow fragments using the databases of fragments and the samefitting algorithm.
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 35/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 35
GLIDE (www.schrodinger.com)
Funnel: site point search -> diameter test -> subset test -> greedy score ->refinement -> grid-based energy optimization -> GlideScore.
Approximates a complete systematic search of the conformational,orientational, and positional space of the docked ligand.
Hierarchical filters, including a rough scoring function that recognizeshydrophobic and polar contacts, dramatically narrow the search space
Torsionally flexible energy optimization on an OPLS-AA nonbonded potentialgrid for a few hundred surviving candidate poses.
The very best candidates are further refined via a MC sampling of poseconformation.
A modified ChemScore (Eldridge et al. 1997) that combines empirical andforce-field-based terms.
Validation: 282 complexes, new ligand conformation, the top-ranked pose:50%<1 A, ~33% >2 A.
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 36/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 36
FRED (OpenEye www.eyesopen.com)
Systematic, nonstochastic, docking
Directed docking with SMARTS enclosures
ChemScore, PB-SA, ChemGauss, PLP, ScreenScore
Multiple active site comparisons
Multiple simultaneous scoring functions and hit lists
RMS clustering of hit-lists
Refinement of docked poses in the context of the active site using MMFF
On-the-fly OMEGA conformer generation
Robust reading and specification-compliant writing of SDF, MOL, MOL2,PDB, MacroModel, XYZ, and OEBinary file formats
Distributed processing via PVM for most Unix platforms
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 37/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 37
CDOCKER & SDOCKER
Randomly generate ligand seeds in the binding site High temperature MD using a modified version of CHARMM
Locate minima from all of the MD simulations
Fully minimization
Cluster on position and geometry Rank by energy (interaction + ligand conformation)
SDOCKER: X-ray structure of complex as templates to guide docking
Wu et al. 2003;Wu et al. 2004.
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 38/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 38
Matrix of Accuracy & Success
Drug <- Quality Novel Lead <- Active
Reproduce binding mode (X-ray crystal structures)
Predict binding affinity (free energies)
Rank diverse set of compounds (by binding affinity)
Enhance hit rate for database mining
Reduce false positive (N selected -N hits) and false negative (N all_hits-N hits)
Fast enough for iterative SBDD
0
_ 0
all
hitsall
VS selec ted
hits
VS
N
N
N N
H
H EF active inactive
active TRUE FALSE
inactive FALSE TRUE
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 39/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 39
Accuracy of Docking
Reality Boundary Experimental errors: 0.1-0.25 kcal/mol (18-53%) with MSR (maximum significant
ratio) as much as 3 fold (0.65 kcal/mol)
Free energy calculation accuracy: ~1 kcal/mol (5.4 fold) starting with an accurategeometric model & fully sampling
Entropy and solvation estimation need a sufficiently long simulation run with an
accurate force field, an ensemble of explicit of water molecules, and fully sampling
Current Reproduce X-ray structure with rmsd<2A: 50-90% achievable
Binding affinity: 1.5~2 log unit (32-100 fold, 2.05-2.73 kcal/mol)
Correlation between scores and affinities, r^2<0.3
Enthalpy ranking with minimization: ±5 kcal/mol Hit rate enhancement : 2~50 fold with hit rate 1-20% (and high false negative rate
if 1~5% of total compounds selected)
(Wang et al. 2003; Erickson et al. 2004; and others.)
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 40/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 40
Docking Accuracy: Examples
Example 1. Docking of a focused library of 55 PI3Kg inhibitors whichshare a common chemotype, IC50 8-20000 nM GLIDE docking, scored by LUDI, Ligscore, GScore, PMF, PLP.
r^2=0.02-0.15
Straight GLIDE docking: hit rate 0.34%.
Used additional knowledge (only poses with substructure’s rmsd <2.5 A vs. a co-crystal), hit rate 9.8% (J. Klicic, 2004)
1,000,000 tested 3,000 actives 10 qualitynovel leads
0.3% 0.3%
1,500actives
15,000 needed
if only 200-2,000 selected
10%, EF=33
75-100%, EF>250
Typical HTS
To find 5 quality leads using docking:
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 41/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 41
Docking Accuracy: More Examples
Example 2. 800 PDB complexes, resolution<2.5A, Ki or Kd known,MW<1000, non-covalent bond, no cofactor, 200 different proteins 13 scoring functions from SYBYL, Cerius2, GOLD, etc
r^2 = 0.02 to 0.32, sd = 1.8 to 2.2 log (2.5-3.0 kcal/mol)
Best from X-Score, DrugScore, Sybyl ChemScore, Cerius2 PLP (Wang et. al.2004)
Example 3. Compared CDOCKER, DOCK, GOLD, FlexX forreproducing X-ray crystal structure with rmsd < 2 A The most important factors are flexibility of protein and ligand
Suggest to apply VS on only compounds with <8 rotatable bonds
Use CORINA for 3D conformation generation
Softer potentials in the beginning (Erickson et al. 2004)
Bottom line: current docking is almost always better than random, butstill way too inaccurate to be a sole or dominant approach for lead
generation. Multiple CADD & SBDD approaches should be used forany VS/MTS and lead optimization efforts.
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 42/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 42
Docking Applications
Determine the lowest free energy structures for the receptor-ligand complex
Search database and rank hits for lead generation
Calculate the differential binding of a ligand to two differentmacromolecular receptors
Study the geometry of a particular complex
Propose modification of a lead molecules to optimize potency
or other properties
de novo design for lead generation
Library design
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 43/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 43
Docking of Combinatorial Libraries
Combinatorial docking problem: given a library of ligands, calculatethe docking score (and the geometry of the complex) for each molecules ofthe library
R-group selection problem: given a library, select molecules for theindividual R-groups in order to form a smaller sublibrary with an enriched
number of hits
de novo library design: given a catalog of available reagents, design alibrary (incl. The rules of synthesis) that will optimize the number of hits
The incremental construction method: PRO_SELECT, CombiDOCK
(Sun, Ewing et al. 1998), FlexXc
Docking of the fully enumerated library followed by plateoptimization or cherry-picking
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 44/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 44
Docking to Nucleic Acid Targets
RNA and DNA as potential drug targets Ribosome RNA structures (Agalarov et al. 2000; Ban et al. 2000; Filikov
et al. 2000; Nissen et al. 2000; Wimberly et al. 2000)
Highly charged environments, well-defined binding pocket
DOCK identified compounds selectively bind to RNA duplexes orDNA qudraplexes (Chen et al. 1996; Chen et al. 1997). The portionsin the DOCK suite that calculate electrostatics, including solvation,partial charges, and scoring function were recently optimized forRNA targets (Downing et al. 2003; Kang et al. 2004).
A MC minimization and an empirical scoring function which accountsfor solvation, isomerization free energy, and changes inconformational entropy were used to rank compounds (Hermann &Westhof 1999).
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 45/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 45
Challenges to Docking Approach
Binding affinity is only one of many attributes of a drug
Structures of most drugeable targets undetermined
The identification of the binding site
Dependence on protein and ligand structures Source (epo, co-crystal, complex of other inhibitor, NMR, homology), Treatment
(hydrogen atoms, optimization), Flexibility, Starting Conformation, StructuralDiversity, Protonated State
Similar ligands may unexpectedly bind in quite different modes
MJ33 in phospholipase A2 (Sekar et al. 1997); BANA113 in influenza virusneuraminidase (Sudbeck et al. 1997).
Favor larger & more complicated molecules But contributions to binding free energy from the heavy atoms of the ligand level
off at ~15 atoms. Many interactions, including H-bonding, do not always lead tohigher binding affinity (Kuntz et al. 1999).
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 46/47
September 21, 2004 Copyright © 2004 Eli Lilly and Company 46
Challenges to Docking Approach
Large energies vs. small energy differences
Find weakly potent compounds in pools of nonbinders
High false positives and false negatives from in silico screen
Explicit water are needed for: volume, change shape of the bindingsite, bridging interaction
A scoring function that always has its global optimum in agreementwith the experiment.
Good affinity prediction not necessarily leads to correct binding mode
Speed and accuracy
7/27/2019 13.Docking Scoring
http://slidepdf.com/reader/full/13docking-scoring 47/47
Successes of Docking & SBDD
HIV protease inhibitor amprenavir (Agenerase) from Vertex &GSK (Kim et al. 1995)
HIV: nelfinavir (Viracept) by Pfizer (& Agouron) (Greer et al.1994)
Influenza neuraminidase inhibitor zanamivir (Relenza) by GSK(Schindler 2000)
Widely used & greatly appreciated. Identified many hits. Review articles by Kuntz 1992; Kuntz et al. 1994; Kubinyi 1998; Muegge
& Rarey 2001; Blundell 2002; Halperin et al. 2002; Shoichet et al. 2002;Taylor et al. 2002; Waszkowycz 2002; Davis et al. 2003; Schneidman-duhovny et al. 2004.