copyright © 2004 eli lilly and company docking & scoring embo-course: “methods for protein...

47
Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24, 2004. Qi Chen Eli Lilly & Company Indianapolis, Indiana U.S.A. [email protected]

Upload: alvin-dickerson

Post on 17-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

Copyright © 2004 Eli Lilly and Company

Docking & Scoring

EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24, 2004.

Qi ChenEli Lilly & CompanyIndianapolis, [email protected]

Page 2: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 2

Outline

Introduction

Docking Methods Representation of receptor binding site and ligand

Sampling of configuration space of the ligand-receptor complex

Scoring Methods Free energy, binding affinity, and docking scores

Scoring functions, consensus scoring, and others

Docking Software Existing software

DOCK, FlexX, GOLD, AutoDock, LUDI, Glide, FRED, CDOCKER

Accuracy, Applications, and Successes

Page 3: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 3

What Are Docking & Scoring?

To place a ligand (small molecule) into the binding site of a receptor in the manners appropriate for optimal interactions with a receptor.

To evaluate the ligand-receptor interactions in a way that may discriminate the experimentally observed mode from others and estimate the binding affinity.

ligand

receptor

complex

docking scoring

… etc

X-ray structure& G

Page 4: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 4

Why Do We Do Docking?

Drug discovery costs are too high: ~$800 millions, 8~14 years, ~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004)

Drugs interact with their receptors in a highly specific and complementary manner.

Core of the target-based structure-based drug design (SBDD) for lead generation and optimization.

Lead is a compound that shows biological activity, is novel, and has the potential of being structurally modified for improved bioactivity,

selectivity, and drugeability.

Page 5: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 5

Three Components of Docking

Representation of receptor binding site and ligand

pre- and/or during docking:

Sampling of configuration space of the ligand-receptor complex

during docking:

Evaluation of ligand-receptor interactions

during docking and scoring:

Page 6: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 6

Receptor Structures & Binding Site Descriptions PDB (Protein Data Bank, www.rcsb.org/pdb/) containing proteins or

enzymes: X-ray crystal: >12,000 structures, 788 have ≤ 1.5 Å, 9,390 between 1.5-2.5 Å NMR: >450 structures, ensemble accuracy of 0.4-1 Å in the backbone region, 1.5

Å in average side chain position (Billeter 1992; Clore et al. 1993) (and high quality homology models built from highly similar sequences)

Limitation of experimental structures (Davis et al. 2003): Locations of hydrogen atoms, water molecules, and metal ions Identities and locations of some heavy atoms (e.g., ~1/6 of N/O of Asn & Gln, and

N/C of His incorrectly assigned in PDB; up to 0.5 Å uncertainty in position) Conformational flexibility of proteins

Binding site descriptions: atomic coordinates, surface, volume, points & distances, bond vectors, grid and various properties such as electrostatic potential, hydrophobic moment, polar, nonpolar, atom types, etc.

DOCK

Page 7: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 7

Drug, Chemical & Structural Space

Drug-like: MDDR (MDL Drug Data Report) >147,000 entries, CMC (Comprehensive Medicinal Chemistry) >8,600 entries

Non-drug-like: ACD (Available Chemicals Directory) ~3 million entries

Literatures and databases, Beilstein (>8 million compounds), CAS & SciFinder

CSD (Cambridge Structural Database, www.ccdc.cam.ac.uk): ~3 million X-ray crystal structures for >264,000 different compounds and >128,00 organic structures

Available compounds Available without exclusivity: various vendors (& ACD)

Available with limited exclusivity: Maybridge, Array, ChemDiv, WuXi Pharma, ChemExplorer, etc.

Corporate databases: a few millions in large pharma companies

Page 8: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 8

3D Structural Information & Ligand Descriptions

2D->3D software: CORINA, OMEGA, CONCORD, MM2/3, WIZARD, COBRA. (reviewed by Robertson et al. 2001)

CSD: <0.1 Å for small molecules, but may not be the bound conformation in the receptor

PDB: ligand-bound protein structures ~6000 entries

Atoms associated with inter-atom distances, physical and chemical properties, types, charges, pharmacophore, etc

Flexibility: conformation ensemble, fragment-based

Page 9: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 9

Sampling of Configuration Space of The Ligand-Receptor Complex

Descriptor-matching: using pattern-recognizing geometric methods to match ligand and receptor site descriptors

geometric, chemical, pharmacophore properties, such as distance pairs, triplet, volume, vector, hydrogen-bond, hydrophobic, charged, etc.

Molecular simulation: MD (molecular dynamics), MC (Monte Carlo)

Others: GA (genetic algorithm), similarity, fragment-based

Challenges Complete conformation and configuration space of ligand and receptor complex

are too large. Conformational flexibility of both ligand and receptor can’t be ignored. Shape-matching: No ‘best’ method and general solutions for describing and

matching molecular shape of irregular objects (Ullman 1976; Salomaa 1991). Shape alone is not sufficient descriptor to identify low-energy conformations of a ligand-receptor complex (Jorgensen 1991).

Page 10: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 10

Descriptor Matching Methods: DOCK

Distance-compatibility graph in DOCK (Ewing and Kuntz 1997): distances between sphere centers and distances between ligand heavy atoms

Page 11: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 11

Descriptor Matching Methods

Distance-compatibility graph in DOCK (Ewing and Kuntz 1997): distances between sphere centers and distances between ligand heavy atoms

Interaction site matching in LUDI (Boehm 1992): HBA<->HBD, HYP<->HYP

Pose clustering and triplet matching in FlexX (Rarey et al. 1996): HBA<->HBD, HYP<->HYP

Shape-matching in FRED (Openeye www.eyesopen.com)

Vector matching in CAVEAT (Lauri and Bartlett 1994)

Steric effects-matching in CLIX (Lawrence and Davis 1992)

Shape chemical complementarity in SANDOCK (Burkhard et al. 1998)

Surface complementarity in LIGIN: (Sobolev et al. 1996)

H-bond matching in ADAM (Mizutani et al. 1994)

Page 12: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 12

Fragment-based Methods Flexibility and/or de novo design

Identification and placement of the base/anchor fragment are very important

Energy optimization (during or post-docking) is important

ExamplesIncremental construction in FlexX with triplet matching and pose clustering to maximize the number of favorable interactionsGrowing and/or joining in LUDI from pre-built fragment and linker libraries and maximize H-bond and hydrophobic interactionsAnchor-based fragment joining in DOCK

Page 13: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 13

Molecular Simulation: MD & MC

Two major components: The description of the degrees of freedom The energy evaluation

The local movement of the atoms is performed Due to the forces present at each step in MD (Molecular Dynamics) Randomly in MC (Monte Carlo)

Usually time consuming: Search from a starting orientation to low-energy configuration Several simulations with different starting orientation must be performed to get a

statistically significant result

Grid for energy calculation. Larger steps or multiple starting poses are often used for speed and sampling coverage in MD: Di Nola et al. 1994; Mangoni et al. 1999; Pak & Wang 2000; CDOCKER by Wu et

al. 2003.

Page 14: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 14

MC-based Docking

where T is reduced based on a so-called cooling schedule, and grid can be used for energy calculation.

An advantage of the MC technique compared with gradient-based methods (e.g. MD) is that a simple energy function can be used which does not require derivative information, and able to step over energy barrier.

AutoDOCK (Goodsell & Olson 1990). MCDOCK (Liu & Wang 1999), PRODOCK (Trosset & Scheraga 1999), ICM (Abagyan et al. 1994).

Simulated annealing is used in DockVision (Hart & Read 1992) and Affinity (Accelrys Inc., San Diego, CA)

Energy minimization is used in QXP (McMartin & Bohacek 1997).

Tk

AEBEP

B

)()(exp

Page 15: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 15

Genetic Algorithm Docking

A fitness function is used to decide which individuals (configurations) survive and produce offspring for the next iteration of optimization. Degrees of freedom are encoded into genes or binary strings.

The collection of genes (chromosome) is assigned a fitness based on a scoring function. There are three genetic operators:

mutation operator randomly changes the value of a gene; crossover exchanges a set of genes from one parent chromosome to another; migration moves individual genes from one sub-population to another.

Requires the generation of an initial population where conventional MC and MD require a single starting structure in their standard implementation.

GOLD (Jones et al. 1997); AutoDock 3.0 (Morris et al. 1998); DIVALI (Clark & Ajay 1995).

Page 16: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 16

Multiple Method Approach

Similarity-guided MD simulated annealing to improve accuracy (Wu & Vieth 2004).

Shape similarity & clustering to speed up conformational search in docking (Makino & Kuntz 1998).

Better input or constrains for the existing docking enginesBetter input or constrains for the existing docking engines

systematic searchconformations

rigid DOCKminimization

MD/SA

(Wang et al. 1999)

initial posesfilters

finer docking final scoring (FRED, GLIDE, DOCK)

Page 17: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 17

Scoring Functions

A fast and simplified estimation of binding energies

STGGGG

KRTG

ninteractiosolvproteinsolvligandsolvcomplex

affinitybinding

///

ln

configurations of the complex

-sco

res

X-ray structure

?

scores <-> Gbinding

Page 18: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 18

Types of Scoring Functions

Force field based: nonbonded interaction terms as the score, sometimes in combination with solvation terms

Empirical: multivariate regression methods to fit coefficients of physically motivated structural functions by using a training set of ligand-receptor complexes with measured binding affinity

Knowledge-based: statistical atom pair potentials derived from structural databases as the score

Other: scores and/or filters based on chemical properties, pharmacophore, contact, shape complementary

Consensus scoring functions approach

Page 19: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 19

Force Field Based Scoring Functions

Advantages FF terms are well studied and have some physical basis Transferable, and fast when used on a pre-computed grid

Disadvantages Only parts of the relevant energies, i.e., potential energies & sometimes

enhanced by solvation or entropy terms Electrostatics often overestimated, leading to systematic problems in

ranking complexes

lig

i

rec

j ij

ji

bij

ij

aij

ij

Dr

qq

r

B

r

AE

1 1

332e.g. AMBER FF in DOCK

Page 20: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 20

FF Scoring: Implementations

AMBER FF: DOCK, FLOG, AutoDOCK

CHARMm FF: CDOCK, MC-approach (Caflisch et al. 1997)

Potential Grid: rigid receptor structure upon docking. The grid-based score interpolates from eight surrounding grid points only. 100-fold speed up. Examples: DOCK, CDOCK, and many other docking programs.

Soften VDW: A soft-core vdw potential is needed for the kinetic accessibility of the binding site (Vieth et al. 1998). FLOG: 6-9 Lennard-Jones function; GOLD: 4-8 vdw + H-bond, and intraligand energy.

Solvent Effect on Electrostatic: often approximated by rescaling the in vacuo coulomb interactions by 1/D, where D = 1-80 or = n*r, n = 1-4, r = distance.

Solvation and Entropy Terms: Solvation terms decomposed into nonpolar and electrostatic contributions (e.g., DOCK):

npsolvelecsolvnonbondbind EEEE ,,

Page 21: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 21

Empirical Scoring Functions

Goals: reproduce the experimental values of binding energies and with its global minimum directed to the X-ray crystal structure

Advantages: fast & direct estimation of binding affinity

Disadvantages Only a few complexes with both accurate structures & binding energies known

Discrepancy in the binding affinities measured from different labs

Heavy dependence on the placement of hydrogen atoms

Heavy dependence of transferability on the training set

No effective penalty term for bad structures

,.

,int_,int_

,_0

RfcontlipoG

RfaroGRfionicG

RfHbondsneutralGNGGG

lipo

aroio

HBrotrot

LUDI & FlexX (Boehm 1994)

Page 22: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 22

Empirical Scoring: Implementations

Mostly differ by what training set and how many parameters are used

Cerius2/Insight2000: LUDI, ChemScore, PLP, LigScore

SYBYL: FlexX, F-Score

Hammerhead: 17 parameters for hydrophobic, polar complementary, entropy, solvation. sLOO= 1.0 logK for 34 complexes

VALIDATE: 8 parameters for VDW and Coulomb interactions, surface complementarity, lipophilicity, conformational entropy and enthalpy, lipophilic and hydrophilic complementarity between receptor and ligand surfaces

PRO_LEADS: 5 coefficients for lipophilic, metal-binding, H-bond, and a flexibility penalty term. sLOO= 2 kcal/mol for 82 complexes

SCORE (Tao & Lai, 2001); ChemScore (GOLD)

Page 23: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 23

Knowledge-based Potentials of Mean Force Scoring Functions (PMF)

Assumptions An observed crystallographic complex represents the optimum placement of the

ligand atoms relative to the receptor atoms

The Boltzmann hypothesis converts the frequencies of finding atom A of the ligand at a distance r from atom B of the receptor into an effective interaction energy between A and B as a function of r

Advantages Similar to empirical, but more general (much more distance data than binding

energy data)

Disadvantages The Boltzmann hypothesis originates from the statistics of a spatially uniform

liquid, while receptor-ligand complex is a two-component non-uniform medium

PMF are typically pair-wise, while the probability to find atoms A and B at a distance r is non-pairwise and depends also on surrounding atoms

Page 24: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 24

PMF: Implementations

Verkhivker et al.(1995): 12 atom pairs, 30 complexes (HIV-1 and simian immunodeficiency virus). Test on 7 other HIV-1 protease complexes

Wallqvist et al. (1995): 38 complexes, 21 atom types (10 C, 5 O, 5 N, 1 S). Test on 8 complexes sd=1.5 kcal/mol, and 20 complexes rmsd=1.0 A.

Muegge et al. (1999): 697 complexes, 16 atom types from receptor & 34 from ligand, 282 statistically significant PMF interactions. Test on 77 diverse compounds: sd=1.8 log Ki. The PMF was combined with a vdw term to account for short-range interactions for DOCK4 docking:

DrugScore (Gohlke et al, 2000), FlexX, BLEEP

i j

ijijpred PG ln

ijcutoffrrkl

ij rAscorePMF,

_

ijbulk

ijsegj

corrVolBij

rrfTkrA

_lnwhere

Page 25: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 25

Consensus Scoring and Others

Too many scoring functions, none prevails in terms of predictivity

Combined approach: one scoring function to sample configuration space, the other(s) to optimize and/or score: 2 docking methods & 13 scoring functions to significantly reduces false positive

rate (Charifson et al. 1999)

Postprocessing of docking results with a filter function followed by re-scoring (Stahl & Bohm 1998)

ADAM, FlexX, Hammerhead

SYBYL Cscore (Tripos) : FlexX, PMF, DOCK energy, GOLD score

C2 (Accelrys) : LigScore2, PLP, PMF, Ludi, Jain

FRED (OpenEye) : ChemScore, PB-SA, ChemGauss, PLP, ScreenScore

DOCK: AMBER FF, PMF, contact scores, ChemScore

Reduce false positives!

Page 26: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 26

An Example of Combined Empirical and Knowledge-based Approach

Procedure1. Knowledge-based potentials

2. Optimize the ligand position with the scoring function

3. Fit the scores to experimental values

4. Re-optimize ligand positions iteratively until the ligand positions and calibrated parameters have finally converged.

Scoring function: 7 atom types (1 C, 4 O, 2 N), cutoff 7 A, 2000 complexes, rmsd<2A, no metal ions, 164 binding energies, sd =2.1 kcal/mol, rmsd=0.49A

Validation: 36 rigid complexes, AlgoDock, FlexX, Gold, Dock, rmsd 0.74-1.68A; 25 known binding energies: sd = 2.0 kcal/mol

))(

exp()( ,, T

rFrP BA

BA

CrPTrF BABA ))(log()( ,,

Muryshev et al. 2003

ji

jiBA rFG,

,, )(

2,

,

)()(

r

rnrP BA

BA &

Page 27: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 27

Docking Software DOCK: (Kuntz et al. 1982)DOCK 4.0 (Ewing & Kuntz 1997)AutoDOCK (Goodsell & Olson 1990)AutoDOCK 3.0 (Morris et al. 1998) GOLD (Jones et al. 1997)FlexX: (Rarey et al. 1996) GLIDE: (Friesner et al. 2004)ADAM (Mizutani et al. 1994)CDOCKER (Wu et al. 2003)CombiDOCK (Sun et al. 1998)DIVALI (Clark & Ajay 1995)DockVision (Hart & Read 1992)FLOG (Miller et al. 1994) GEMDOCK (Yang & Chen 2004)Hammerhead (Welch et al. 1996)LIBDOCK (Diller & Merz 2001)MCDOCK (Liu & Wang 1999)PRO_LEADS (Baxter et al. 1998)

SDOCKER (Wu et al. 2004)QXP (McMartin & Bohacek 1997)Validate (Head et al. 1996)

de novo design toolsLUDI (Boehm 1992), BUILDER (Roe & Kuntz 1995)SMOG (DeWitte et al. 1997)CONCEPTS (Pearlman & Murcko 1996)DLD/MCSS (Stultz & Karplus 2000)Genstar (Rotstein & Murcko 1993)Group-Build (Rotstein & Murcko 1993)Grow (Moon & Howe 1991)HOOK (Eisen et al. 1994)Legend (Nishibata & Itai 1993)MCDNLG (Gehlhaar et al. 1995)SPROUT (Gillet et al. 1993)

Page 28: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 28

Docking Software: Important Factors

Sensitivity on and transferability of the parameters, including the starting conformation

Adaptability to additional scoring functions, pre- and/or post- docking processing and filters

Ability for iteratively refining docking parameter/protocol based on new results

Design, components, and results of validation studies

Speed, user interface & control, I/O, structural file formats

User learning curve, customer supports, and cost

Code availability and upgrading possibility

Page 29: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 29

DOCK (Kuntz, UCSF)

Receptor Structure• X-ray crystal• NMR• homology

Binding Site

Molecular Surface of Binding Site

Spheres describing the shape of binding site andfavorable locations of potential ligand atoms

Matching heavy atoms of ligands to centers ofspheres to generate thousandsof binding orientations

Scoring Orientations1. Energy scoring (vdw and electrostatic)2. Contact scoring (shape complementarity)3. Chemical scoring4. Solvation terms

Virtual Screening for MTS/HTS and Library Design: ligands in the order of their best scores

Binding Mode Analysis for Lead Optimization: binding orientations and scores for each ligands

Ligands• 3D structure• atomic charges• potentials• labeling

Filters

Page 30: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 30

DOCK: Conformational Flexibility

Torsion-drive and anchor-based options (DOCK4.0)

GA to generate ligand conformations inside the binding site (Oshiro et al. 1995)

A ligand anchor fragment is selected and placed in the receptor, followed by rigid body simplex minimization (Makino & Kuntz 1997)

Ensembles: ~300 conformations are created with the rigid part superimposed. DOCK applied to the rigid part and all conformation were tested for overlap and scored. (Lorber & Shoichet 1998)

Multiple random ligand conformations (Ewing et al. 2001)

Ensemble of protein structures (Knegtel et al. 1997)

Page 31: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 31

FlexX (Tripos/SYBYL)

Fragment-based, descriptor matching, empirical scoring (Rarey et al. 1996)

Procedures: Select a small set of base fragment suitable for placement using a simple scoring

function. Place base fragments with the pose clustering algorithm: rigid, triplet matching of H-

bond & hydrophobic interactions, Bohm's scoring function Build up the remainder of the ligand incrementally from other fragments

Ligand conformations MIMUMBA model with CSD derived low energy torsional angles for each rotatable

bond and ring from CORINA. Multiple conformations for each fragment in the ligand building steps

Other works: Explicit waters are placed into binding site during the docking procedure using pre-computed water positions(Rarey et al. 1999). Receptor flexibility using discrete alternative protein conformations (Claussen et al. 2001; Claussen & Hindle 2003)

Page 32: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 32

GOLD

GA method, H-bond matching, FF scoring (Jones et al. 1997) A configuration is represented by two bit strings:

1. The conformation of the ligand and the protein defined by the torsions;

2. A mapping between H-bond partners in the protein and the ligand. For fitness evaluation, a 3D structure is created from the chromosome

representation. The H-bond atoms are then superimposed to H-bond site points in the receptor site.

Fitness (scoring) function: H-bond, the ligand internal energy, the protein-ligand van der Waals energy

Rotational flexibility for selected receptor hydrogens along with full ligand flexibility

Highlights: Validation test set: 100 complexes, 66 with rmsd<2A. The structure generation is biased towards inter-molecular H-bonds. Hydrophobic fitting points was added (GOLD 1.2, CCDC, Cambridge, UK 2001).

Page 33: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 33

AutoDock & AutoDock 3.0

Early implementation: MC simulated annealing, AMBER FF-based energy grid, flexible ligands (Goodsell & Olson 1990)

AutoDock 3.0: GA as a global optimizer combined with energy minimization as a local search method, flexible ligand, rigid protein as represented in a grid (Morris et al. 1998)

The fitness function: a Lennard-Jones 12-6 dispersion/repulsion term a directional 12-10 hydrogen bond term a coulombic electrostatic potential a term proportional to the number of sp3 bonds in the ligand to represent

unfavorable entropy of ligand binding a desolvation term

Page 34: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 34

LUDI: Matching polar and hydrophobic groups

Calculate protein and ligand interaction sites (H-bond or hydrophobic), which are defined by centers and surface, from non-bonded contact distributions based on a search through the CSD, a set of geometric rules, the output from the program GRID (Goodford 1985) which calculates binding

energies for a given probe with a receptor molecule.

Fit fragments onto the interaction sites. distance between interaction sites on the receptor an RMSD superposition algorithm, A hashing scheme to access and match surface triangles onto a triangle query of

a ligand interaction center. A list-merging algorithm creates all triangles based on lists of fitting triangle edges

for two of the three query triangle edges.

Join/grow fragments using the databases of fragments and the same fitting algorithm.

Page 35: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 35

GLIDE (www.schrodinger.com)

Funnel: site point search -> diameter test -> subset test -> greedy score -> refinement -> grid-based energy optimization -> GlideScore.

Approximates a complete systematic search of the conformational, orientational, and positional space of the docked ligand.

Hierarchical filters, including a rough scoring function that recognizes hydrophobic and polar contacts, dramatically narrow the search space

Torsionally flexible energy optimization on an OPLS-AA nonbonded potential grid for a few hundred surviving candidate poses.

The very best candidates are further refined via a MC sampling of pose conformation.

A modified ChemScore (Eldridge et al. 1997) that combines empirical and force-field-based terms.

Validation: 282 complexes, new ligand conformation, the top-ranked pose: 50%<1 A, ~33% >2 A.

Page 36: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 36

FRED (OpenEye www.eyesopen.com)

Systematic, nonstochastic, docking

Directed docking with SMARTS enclosures

ChemScore, PB-SA, ChemGauss, PLP, ScreenScore

Multiple active site comparisons

Multiple simultaneous scoring functions and hit lists

RMS clustering of hit-lists

Refinement of docked poses in the context of the active site using MMFF

On-the-fly OMEGA conformer generation

Robust reading and specification-compliant writing of SDF, MOL, MOL2, PDB, MacroModel, XYZ, and OEBinary file formats

Distributed processing via PVM for most Unix platforms

Page 37: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 37

CDOCKER & SDOCKER

Randomly generate ligand seeds in the binding site High temperature MD using a modified version of CHARMM Locate minima from all of the MD simulations Fully minimization Cluster on position and geometry Rank by energy (interaction + ligand conformation) SDOCKER: X-ray structure of complex as templates to guide docking

Wu et al. 2003; Wu et al. 2004.

Page 38: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 38

Matrix of Accuracy & Success

Drug <- Quality Novel Lead <- Active

Reproduce binding mode (X-ray crystal structures)

Predict binding affinity (free energies)

Rank diverse set of compounds (by binding affinity)

Enhance hit rate for database mining

Reduce false positive (Nselected-Nhits) and false negative (Nall_hits-Nhits)

Fast enough for iterative SBDD

0

_0

all

hitsall

VSselected

hits

VS

NN

NN

H

HEFactive inactive

active TRUE FALSEinactive FALSE TRUE

expt.pred.

Page 39: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 39

Accuracy of Docking

Reality Boundary Experimental errors: 0.1-0.25 kcal/mol (18-53%) with MSR (maximum significant

ratio) as much as 3 fold (0.65 kcal/mol) Free energy calculation accuracy: ~1 kcal/mol (5.4 fold) starting with an accurate

geometric model & fully sampling Entropy and solvation estimation need a sufficiently long simulation run with an

accurate force field, an ensemble of explicit of water molecules, and fully sampling

Current Reproduce X-ray structure with rmsd<2A: 50-90% achievable Binding affinity: 1.5~2 log unit (32-100 fold, 2.05-2.73 kcal/mol) Correlation between scores and affinities, r^2<0.3 Enthalpy ranking with minimization: ±5 kcal/mol Hit rate enhancement : 2~50 fold with hit rate 1-20% (and high false negative rate

if 1~5% of total compounds selected)

(Wang et al. 2003; Erickson et al. 2004; and others.)

Page 40: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 40

Docking Accuracy: Examples

Example 1. Docking of a focused library of 55 PI3Kg inhibitors which share a common chemotype, IC50 8-20000 nM GLIDE docking, scored by LUDI, Ligscore, GScore, PMF, PLP.

r^2=0.02-0.15

Straight GLIDE docking: hit rate 0.34%.

Used additional knowledge (only poses with substructure’s rmsd <2.5 A vs. a co-crystal), hit rate 9.8% (J. Klicic, 2004)

1,000,000 tested 3,000 actives 10 quality novel leads

0.3% 0.3%

1,500actives

15,000 needed

if only 200-2,000 selected

10%, EF=33

75-100%, EF>250

Typical HTS

To find 5 quality leads using docking:

Page 41: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 41

Docking Accuracy: More Examples

Example 2. 800 PDB complexes, resolution<2.5A, Ki or Kd known, MW<1000, non-covalent bond, no cofactor, 200 different proteins 13 scoring functions from SYBYL, Cerius2, GOLD, etc r^2 = 0.02 to 0.32, sd = 1.8 to 2.2 log (2.5-3.0 kcal/mol) Best from X-Score, DrugScore, Sybyl ChemScore, Cerius2 PLP (Wang et. al. 2004)

Example 3. Compared CDOCKER, DOCK, GOLD, FlexX for reproducing X-ray crystal structure with rmsd < 2 A The most important factors are flexibility of protein and ligand Suggest to apply VS on only compounds with <8 rotatable bonds Use CORINA for 3D conformation generation Softer potentials in the beginning (Erickson et al. 2004)

Bottom line: current docking is almost always better than random, but still way too inaccurate to be a sole or dominant approach for lead

generation. Multiple CADD & SBDD approaches should be used for any VS/MTS and lead optimization efforts.

Page 42: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 42

Docking Applications

Determine the lowest free energy structures for the receptor-ligand complex

Search database and rank hits for lead generation

Calculate the differential binding of a ligand to two different macromolecular receptors

Study the geometry of a particular complex

Propose modification of a lead molecules to optimize potency or other properties

de novo design for lead generation

Library design

Page 43: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 43

Docking of Combinatorial Libraries

Combinatorial docking problem: given a library of ligands, calculate the docking score (and the geometry of the complex) for each molecules of the library

R-group selection problem: given a library, select molecules for the individual R-groups in order to form a smaller sublibrary with an enriched number of hits

de novo library design: given a catalog of available reagents, design a library (incl. The rules of synthesis) that will optimize the number of hits

The incremental construction method: PRO_SELECT, CombiDOCK (Sun, Ewing et al. 1998), FlexXc

Docking of the fully enumerated library followed by plate optimization or cherry-picking

Page 44: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 44

Docking to Nucleic Acid Targets

RNA and DNA as potential drug targets Ribosome RNA structures (Agalarov et al. 2000; Ban et al. 2000; Filikov

et al. 2000; Nissen et al. 2000; Wimberly et al. 2000)

Highly charged environments, well-defined binding pocket

DOCK identified compounds selectively bind to RNA duplexes or DNA qudraplexes (Chen et al. 1996; Chen et al. 1997). The portions in the DOCK suite that calculate electrostatics, including solvation, partial charges, and scoring function were recently optimized for RNA targets (Downing et al. 2003; Kang et al. 2004).

A MC minimization and an empirical scoring function which accounts for solvation, isomerization free energy, and changes in conformational entropy were used to rank compounds (Hermann & Westhof 1999).

Page 45: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 45

Challenges to Docking Approach

Binding affinity is only one of many attributes of a drug

Structures of most drugeable targets undetermined

The identification of the binding site

Dependence on protein and ligand structures Source (epo, co-crystal, complex of other inhibitor, NMR, homology), Treatment

(hydrogen atoms, optimization), Flexibility, Starting Conformation, Structural Diversity, Protonated State

Similar ligands may unexpectedly bind in quite different modes MJ33 in phospholipase A2 (Sekar et al. 1997); BANA113 in influenza virus

neuraminidase (Sudbeck et al. 1997).

Favor larger & more complicated molecules But contributions to binding free energy from the heavy atoms of the ligand level off at

~15 atoms. Many interactions, including H-bonding, do not always lead to higher binding affinity (Kuntz et al. 1999).

Page 46: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 46

Challenges to Docking Approach

Large energies vs. small energy differences

Find weakly potent compounds in pools of nonbinders

High false positives and false negatives from in silico screen

Explicit water are needed for: volume, change shape of the binding site, bridging interaction

A scoring function that always has its global optimum in agreement with the experiment.

Good affinity prediction not necessarily leads to correct binding mode

Speed and accuracy

Page 47: Copyright © 2004 Eli Lilly and Company Docking & Scoring EMBO-Course: “Methods for Protein Simulation & Drug Design.” Shanghai, China, September 13-24,

September 21, 2004 Copyright © 2004 Eli Lilly and Company 47

Successes of Docking & SBDD

HIV protease inhibitor amprenavir (Agenerase) from Vertex & GSK (Kim et al. 1995)

HIV: nelfinavir (Viracept) by Pfizer (& Agouron) (Greer et al. 1994)

Influenza neuraminidase inhibitor zanamivir (Relenza) by GSK (Schindler 2000)

Widely used & greatly appreciated. Identified many hits. Review articles by Kuntz 1992; Kuntz et al. 1994; Kubinyi 1998; Muegge

& Rarey 2001; Blundell 2002; Halperin et al. 2002; Shoichet et al. 2002; Taylor et al. 2002; Waszkowycz 2002; Davis et al. 2003; Schneidman-duhovny et al. 2004.