know more before you score: an analysis of structure-based virtual screening protocols ä...

Post on 20-Jan-2018

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Ligand Flexibility Studies Procedure ä Multiple sampling techniques chosen: Catalyst-best / Catalyst-fast / Confort / Omega / DOCK ä Variety of sampling levels ä Starting from Concord structure, conformers generated and superimposed onto pdb ligand conformation. ä Conformation with lowest heavy atom RMS to used as quality measure

TRANSCRIPT

Know More Before You Score: An Analysis of Structure-Based Virtual

Screening Protocols

Structure-Based Virtual Screening (SBVS) is a proven technique for Structure-Based Virtual Screening (SBVS) is a proven technique for lead discoverylead discovery

Still many areas for improvementStill many areas for improvement Many efforts focussed on scoring functionMany efforts focussed on scoring function

Often with little consideration of the assumptions underpinning SBVSOften with little consideration of the assumptions underpinning SBVS Here we consider a number of these processes in detail from the Here we consider a number of these processes in detail from the

perspective of our primary SBVS tool (DOCK) perspective of our primary SBVS tool (DOCK) Ligand conformational search protocolsLigand conformational search protocols Varying site points definitionsVarying site points definitions Alteration of sampling variablesAlteration of sampling variables

Determine their impact on hit enrichment and search speedDetermine their impact on hit enrichment and search speed Analyze implications for future researchAnalyze implications for future research

Ligand Flexibility StudiesStrategy

SBVS CPU intensiveSBVS CPU intensive Conformational searching of ligand clearly importantConformational searching of ligand clearly important

Sampling limited to allow search completion in reasonable time frameSampling limited to allow search completion in reasonable time frame Test required to compare different conformational sampling Test required to compare different conformational sampling

methodsmethods Ability to reproduce bioactive conformation testedAbility to reproduce bioactive conformation tested

145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF 145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF unpublished)unpublished)

30 compound subset chosen for analysis- selection based on visual and 30 compound subset chosen for analysis- selection based on visual and numerical inspection of diversity in ligand flexibility and functionality numerical inspection of diversity in ligand flexibility and functionality

Relatively small sample of molecules used, many peptidic in natureRelatively small sample of molecules used, many peptidic in nature Peptidic moieties are among the better parameterized systems, so this is Peptidic moieties are among the better parameterized systems, so this is

in some ways a best case scenario in some ways a best case scenario

Ligand Flexibility StudiesProcedure

Multiple sampling techniques chosen:Multiple sampling techniques chosen:Catalyst-best / Catalyst-fast / Confort / Omega / DOCKCatalyst-best / Catalyst-fast / Confort / Omega / DOCK

Variety of sampling levels Variety of sampling levels Starting from Concord structure, conformers generated Starting from Concord structure, conformers generated

and superimposed onto pdb ligand conformation. and superimposed onto pdb ligand conformation. Conformation with lowest heavy atom RMS to used as quality Conformation with lowest heavy atom RMS to used as quality

measure measure

Ligand Flexibility StudiesSearch Settings Employed

Dock - Dock - conformation_cutoff_factor=3/5/10 clash_overlapconformation_cutoff_factor=3/5/10 clash_overlap==0.7 times 0.7 times vdW radius for clash overlap with customized rules for bond increment vdW radius for clash overlap with customized rules for bond increment settingssettings

Confort - Confort - Rough (0.10 kcal) convergence, diverse conformer selection, Rough (0.10 kcal) convergence, diverse conformer selection, boat ring search on - sampling at 5/10 confs per single bond + 500 max boat ring search on - sampling at 5/10 confs per single bond + 500 max

Catalyst- Best/Fast Catalyst- Best/Fast Default settings - sampling at Default settings - sampling at 5/10 confs per 5/10 confs per single bond + 100 max single bond + 100 max

Omega: Omega: Defaults +Defaults + RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, sampling at 100 maxsampling at 100 max

In addition Concord generated and Sybyl minimized ligand xray structures In addition Concord generated and Sybyl minimized ligand xray structures also analyzed as “controls”also analyzed as “controls”

Ligand Flexibility Results Overall Performance - RMS/ Rank

0.76 0.81 0.88 0.92 0.870.97 0.96 0.99 0.99 1.00 1.03 1.13

1.76

0.002.004.006.008.00

10.0012.0014.00

Min

xra

yCO

NFOR

T 50

0FA

ST 1

00CO

NFOR

T 10

BEST

100

FAST

5DO

CK 1

0BE

ST 5

OM

EGA

100

CONF

ORT

5DO

CK 5

DOCK

3Co

ncor

d

Ave

rage

inte

rnal

rank

0.000.200.400.600.801.001.201.401.601.80

Ave

rage

RM

S de

viat

ion

Average internal rankAverage rms deviation

Ligand Flexibility ResultsPerformance vs Flexibility

0

0.5

1

1.5

2

2.5

Ave

rage

RM

S D

evia

tion

3 to 5 single bonds (15)6 to 8 single bonds (7)9 to 14 single bonds (8)

Ligand Flexibility Results The Pain Gain Ratio

Does extra noise introduced to scoring functions outweigh this Does extra noise introduced to scoring functions outweigh this improvement? Is it worth the extra CPU?improvement? Is it worth the extra CPU?

425

0.81 0.87 0.88 0.92 0.96 0.97 1.03 1.125

0102030405060708090

100

Search Types

Con

form

atio

ns /

mol

ecul

e

0.000.200.400.600.801.001.201.401.601.80

RM

S de

viat

ion

Average conformations / moleculeAverage rms deviation

Ligand Flexibility ResultsVisual Analysis

Even at lower RMS, deviation in hydrogen positions an issueEven at lower RMS, deviation in hydrogen positions an issue As RMS rises (0.9) we begin to see more significant deviations in heavy As RMS rises (0.9) we begin to see more significant deviations in heavy

atom positions - large enough to possibly prove troublesome to atom positions - large enough to possibly prove troublesome to standard force fieldsstandard force fields

RMS=0.65 RMS=0.90

Ligand Flexibility ResultsVisual Analysis

As RMS rises further, hydrogen bond mapping begins to partially break downAs RMS rises further, hydrogen bond mapping begins to partially break down Significant deviation begins to be seen although general shape Significant deviation begins to be seen although general shape

complementarity is still reasonablecomplementarity is still reasonable DOCKing tricky, pharmacophore searches possible with loose tolerances, although DOCKing tricky, pharmacophore searches possible with loose tolerances, although

site point vector definitions (DISCO / Catalyst) a no nosite point vector definitions (DISCO / Catalyst) a no no

RMS=2.19RMS=1.55

Ligand FlexibilityConclusions

At current sampling levels used in virtual screeningAt current sampling levels used in virtual screening Rough search techniques perform comparably to more exhaustive methodsRough search techniques perform comparably to more exhaustive methods

Dock performs quite well, and Fast does slightly better than comparable Best runDock performs quite well, and Fast does slightly better than comparable Best run Results highlight the need for “forgiving” scoring functions and pharmacophore Results highlight the need for “forgiving” scoring functions and pharmacophore

constraint tolerances (especially for flexible molecules)constraint tolerances (especially for flexible molecules) Generating function directly from crystal structure data may not be optimumGenerating function directly from crystal structure data may not be optimum

Use the conformation closest to the biologically relevant structure with chosen sampling Use the conformation closest to the biologically relevant structure with chosen sampling techniquetechnique

May be better to ignore more flexible molecules when possible (~>8 bonds)May be better to ignore more flexible molecules when possible (~>8 bonds)

Analysis of more extensive data set might provide basis for determining if Analysis of more extensive data set might provide basis for determining if optimum sampling settings exist (Best/Omega/Confort)optimum sampling settings exist (Best/Omega/Confort) Coarseness of poling values for exampleCoarseness of poling values for example

Structure-Based Search ProtocolsAn Analysis of DOCK

Working within current DOCK paradigm, what search Working within current DOCK paradigm, what search protocols provide optimum search criterion?protocols provide optimum search criterion? Site point definitionsSite point definitions Alteration of sampling variablesAlteration of sampling variables Different scoring grids Different scoring grids

Comparisons illustrated for 5 test systems with Comparisons illustrated for 5 test systems with diverse active data sets diverse active data sets

Analysis based on ranking within list that includes Analysis based on ranking within list that includes ~10000 “noise” compounds ~10000 “noise” compounds

““Random” selection within bounds of size and flexibility Random” selection within bounds of size and flexibility distribution seen in in-house databasedistribution seen in in-house database

Structure-Based Search ProtocolsDOCK variables

Contains many variables that effect performance Contains many variables that effect performance Ligand sampling within the site being the primary variantLigand sampling within the site being the primary variant

nodesnodes 3/4 3/4distance_tolerance 0.5/1.0distance_tolerance 0.5/1.0distance_minimum 3.0distance_minimum 3.0bump_filter 4bump_filter 4conformation_cutoff_factor 5conformation_cutoff_factor 5clash_overlap 0.7clash_overlap 0.7maximum_orientations 500/5000maximum_orientations 500/5000

Structure-Based Search ProtocolsDOCK and pharmacophoric constraints

It is possible to assign fairly sophisticated pharmacophoric It is possible to assign fairly sophisticated pharmacophoric (henceforth also known as chemical) definitions(henceforth also known as chemical) definitions

name acidname acid# deprotonated carboxyl# deprotonated carboxyldefinition O.co2 ( C )definition O.co2 ( C )# tetrazole# tetrazoledefinition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) )definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) )definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) )# acyl sulphonamide # acyl sulphonamide definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) )definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) )definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )

Current types:heavy atom

donor

acceptor

hydrophobe

aromatic

aromatic_hydrophobic

acid

base

donor_and_acceptor

special (e.g. metal chelator)

Structure-Based Search ProtocolsSite Points Used in Kinase Search

Region 3

Hydrophobic /

Any heavy atom

Region 1 ( + 4)

acceptor / donor

Region 2

Hydrophobic + 2 donors

Structure-Based Search ProtocolsTest Sets and Site Points Used

Sphgen used to generate site points for “generic” DOCK searchesSphgen used to generate site points for “generic” DOCK searches Pharmacophore points derived from a mixture of non-data set bound ligands and in-house Pharmacophore points derived from a mixture of non-data set bound ligands and in-house

programs that process GRID maps and Connolly surfaces (plus plenty of human intervention)programs that process GRID maps and Connolly surfaces (plus plenty of human intervention)

Active data sets broken down into chemotypes to prevent the problem of common analogue Active data sets broken down into chemotypes to prevent the problem of common analogue bias - an under appreciated issue in all validationsbias - an under appreciated issue in all validations

Target Active ChemotypeDefinitions

PharmacophorePoints / Critical

Regions2 Serineproteases

P1 substituent / P1-P4 linker substituent

P1 (base /hydrophobe) + P4(hydrophobe) pockets

2 Fatty acidbindingproteins

Core linking acidmoiety to remainingsubstituents

Acid binding pocket

Kinase Moiety mimicingadenine / main coreof molecules

Adenine bindingpocket(donor/acceptor) [+rear hydrophobicpocket]

Results - kinaseNo. of hits after 50% of chemotypes located

by at least one search ( 400 compounds processed from 96 actives / 18 chemotypes)

Search type key: a_b_c(_d) e.g. cc_f_c_3 ***** NOTE poor 1 crit perform - premature terminationa: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: = nXcr(a.b) - n node search with X critical regions and a.b distance tolerance

05

10152025

Search Type

Com

poun

ds

0246810

Che

mot

ypes

ChemotypesCompounds

Results - fatty acid binding protein 2 No. of hits after 7 chemotypes located by at least one search ( 500

compounds processed from 28 actives / 8 chemotypes)

Search type key: a_b_c(_d) e.g. cc_f_c_3 a: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: 3=3 node search / 1.0=1.0 distance tolerance / 1.02crit/32crit = 1.0 distance tolerance or 3 node search with 2nd critical region ( hydrophobic binding pocket) / esp = electrostatic potential included in mm score / acid=all non acids removed from search lists

0

5

10

15

20

Search Types

Com

poun

ds

0

2

4

6

8

Che

mot

ypes

ChemotypesCompounds

Missing chemotype a citrazinate - not covered in chemical definitions -easy to fix - another advantage over electrostatics

Results-OverallCompounds processed for 50% Chemotype Coverage for All Systems

Search type key: a_b_c(_d) e.g. cc_f_c_3

a: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: 3=3 node search / 1.0=1.0 distance tolerance

s_s_

cs_

s_m

c_s_

cc_

s_m

cc_s

_ccc

_s_m

s_f_

cs_

f_m

c_f_

cc_

f_m

cc_f

_ccc

_f_m

cc_f

_c_3

cc_f

_c_1

.0

0

200

400

600

800

1000

1200

1400C

ompo

unds

Search TypeBest hit rateMean hit rateWorst hit rate

Results Analysis: DOCK Scoring Functions - Shape

Contact generally a little more robust than vdW non bonded Contact generally a little more robust than vdW non bonded functionfunction More controllable bump penalty (no rMore controllable bump penalty (no rnn repulsion) repulsion)

Better able to deal with docking inaccuraciesBetter able to deal with docking inaccuracies More important in tight binding sites with pharmacophore constraints and flexible More important in tight binding sites with pharmacophore constraints and flexible

moleculesmolecules controllable max. vdW repulsion value mitigates this somwhatcontrollable max. vdW repulsion value mitigates this somwhat

Still useful with less flexible molecules for a more rigorous shape complementarity Still useful with less flexible molecules for a more rigorous shape complementarity scorescore

Results Analysis: DOCK Scoring Functions - H Bonding

ElectrostaticsElectrostatics Many intuitive reasons for caution in explicit treatmentMany intuitive reasons for caution in explicit treatment

Poor charge models / coarse conformations /inability to control ionization Poor charge models / coarse conformations /inability to control ionization statesstates

Pharmacophore centers provides better vehicle for h bonding descriptionPharmacophore centers provides better vehicle for h bonding description Spread points to allow for search approximations / set critical regions based Spread points to allow for search approximations / set critical regions based

on biological and structural information / faster searches (30-100 times)on biological and structural information / faster searches (30-100 times)

For maximum impact impact current methodology, scoring functions should either

Be designed/utilized with these limitations in mind Forgiving / targeted at less flexible molecules

Improve results by such a high degree that additional sampling (and CPU) is warranted

In the mean time, utility of pharmacophoric hypotheses {critical region(s) with pharmacophoric constraints} is clear

Better results faster / less sensitivity to model coarseness / allows constraints based on known biology

Conclusions

Acknowledgements

Thank youThank you to my BMS CADD colleagues to my BMS CADD colleagues

top related