gold manual

126
GOLD User Guide & Tutorials Copyright © 2006 The Cambridge Crystallographic Data Centre Registered Charity No 800579

Upload: ramaiyan-dhanapal

Post on 16-Apr-2015

86 views

Category:

Documents


2 download

DESCRIPTION

pdf

TRANSCRIPT

Page 1: Gold manual

GOLD

User Guide & Tutorials

Copyright © 2006 The Cambridge Crystallographic Data CentreRegistered Charity No 800579

Page 2: Gold manual
Page 3: Gold manual

Conditions of Use i

Conditions of Use

GOLD and its associate documentation and software, including SILVER, (together the ‘Program’), are copyright works and all rights are protected. Use of the Program is permitted solely in accordance with a valid Software Licence Agreement and the Program is proprietary. All persons accessing the Program should make themselves aware of the conditions contained in the Software Licence Agreement.

In particular:• The Program is to be treated as confidential and may NOT be disclosed or re-distributed in any

form, in whole or in part, to any third party.

• No representations, warranties, or liabilities are expressed or implied in the supply of the Program by CCDC, its servants or agents, except where such exclusion or limitation is prohibited, void or unenforceable under governing law.

GOLD © 2006 CCDC Software Ltd.SILVER © 2006 CCDC Software Ltd.

Implementation of ChemScore within GOLD © Astex Technology

All rights reserved

Licences may be obtained from:

CCDC Software Ltd.12 Union RoadCambridge CB2 1EZUnited Kingdom

Email:[email protected]:www.ccdc.cam.ac.ukTelephone:+44-1223-336408

236 Index

Threonine hydroxyls, orientation of 10

Top-ranked docking solution 115

Torsion angle distribution fileadding a new distribution to 83

available choices of 83

editing 84

expand directive 85

format of an individual distribution 85

format of header 84

gold.tordist 83

gold.tordist.new 83

mimumba.tordist 83

period directive 85

selecting in front end 5

Torsion angle distributionsadding a new 83

basic use of 83

distributions file 83

examples 87

matching to ligand torsions 88

Torsion angles, allowing protein side chainflexibility 18

Torsion angles, fixing at input conformation viathe gold.conf 66

Tutorials 162

TYPE_DEF (in torsion angle distribution file) 85

U

Use Distributions (check box in front end) 5

User Defined Scoreoverview of 62

User Defined Score (check box in front end) 5

User-defined scoring function, constructing 62

V

Valence angle, bending energy term for covalentcomplexes 33

Valence angles 33

Validation of docking predictionseffect of number of ligand atoms, first

validation 153

effect of number of ligand atoms, secondvalidation 160

effect of number of ligand H-bonding atoms,first validation 153

effect of number of ligand H-bonding atoms,second validation 160

effect of number of ligand torsions, firstvalidation 153

effect of number of ligand torsions, secondvalidation 160

first series of experiments 153

resolution of protein structure 159

root mean square deviations, first validation154

second series of experiments 160

subjective analysis compared with rmsdeviations 158

using the CCDC/Astex validation test set 131

van der Waals (entry box in front end) 5

van der Waals annealing parameterexplanation of 91

setting 5

Van der Waals energyannealing of 91

external (Goldscore) 46

external, scaling of (Goldscore) 46

internal (Goldscore) 46

listed in ligand log file 119

parameters (Goldscore) 46

Virtual screening 98

Visualisationgrommitt 142

using the front end 4

W

Water molecules 16

Page 4: Gold manual

Index 235

end) 3

Selection Pressure (entry box in front end) 7

Selection Pressure (genetic algorithmparameter)

default values 96

explanation of 90

setting value of 96

Serine hydroxyls, orientation of 18

Set atom types (check boxes in front end) 4

setting up proteinsheme containing 15

side chain conformations, defining 19

Side chain flexibility 18

side chain rotamer energy, specifying 24

SILVER 125

analysis of docking results 125

exporting results to 117, 125

visualising docking result 124

Slave process 101

smart_rms 143

Soft potentials, using 47

specifying flexible side chains 19

specifying torsion angle tolerances for rotatableside chains 19

specifying torsion angles for rotatable sidechains 19

Speed of GOLDand reliability 97

effect of early termination 93

effect of genetic algorithm parameters 94

number of dockings 93

Split soft potentials, using 47

standard rotamer library, using 20

Starting geometryof ligand 30

of protein 18

of protein hydroxyl groups 18

Stereochemistry of ligand 31

Sub-directories, creating for output files 112

Submit&Exit (button in front end) 3

Submitting to background 100

Substructure Constraint (menu item in front end)72

Substructure-based constraintssetting up 72

Sulfoxideatom type conventions 44

bond type conventions 44

Sulphonamideatom type conventions 39

bond type conventions 39

Sulphonateatom type conventions 39

bond type conventions 39

Sulphoneatom type conventions 39

bond type conventions 39

SYB_TYPE (in torsion angle distribution file) 85

Symmetry, handling of in RMSD calculations143

T

tag names in output files 151

Tags 151

Tautomerismof histidine 10

of ligand 30

Template Similarity Constraint (menu item infront end) 79

Template similarity constraintsoverview 79

setting up 79

ii Conditions of Use

Page 5: Gold manual

Contacting User Support iii

Contacting User Support

If you have any technical or scientific queries concerning this CCDC product then please contact User Support who will try to help.

Email: [email protected]: http://www.ccdc.cam.ac.uk/supportTel : +44 1223 336022

A list of frequently asked questions (FAQs) are available at the website address given above. This resource is continually being updated with answers to common questions. Please scan the archive for the relevant product before making use of our email and telephone support service.

If you need to contact User Support, please try to provide the following information:

• The name and version number of the product with which you are having problems.

• The make, model and operating system of the workstation you are using.

• A clear description of the problem and the circumstances under which it occurred.

Also be prepared to email error messages and other output. This information is always useful when trying to determine the cause of a problem.

We try to deal with User Support queries within one working day but sometimes problems can take longer to solve. When this happens we will keep you informed of our progress and try to provide you with an answer as quickly as possible.

234 Index

setting a bond as fully rotatable 38

using 38

Rotatable-bond freezing term, in ChemScore 56

rotating a bond during docking using therotatable bond override file 38

Run (button in front end) 3

Running GOLDconfiguration file, use of 100

directory, use new 99

error messages 124

from command line 100

in background 100

interactive diagnostics 100

interactively 100

parallel mode 101

S

S.a (GOLD internal atom type) 45

S.m (GOLD internal atom type) 45

Save&Exit (button in front end) 3

Scaffold match constraint 80

method 81

setting up 81

Scaffold match constraint, overview 80

Scoring functionangle bending term for covalent complexes

33

apparent increase in during genetic algorithmrun 119

bond angle term for covalent complexes 33

bump checking 48

ChemScoreblock functions 50clash penalty 56constraint terms 58covalent term 58explanation of hydrogen-bond terms 52hydrogen-bond terms 52ligand torsional strain 56

lipophilic term 54metal-binding 54overview of 49parameter file 58parameters, altering 58rotatable-bond freezing term 56

choice of GoldScore, ChemScore, UserDefined Score 46

correlation with binding affinity 137

customising parameters 127

GoldScoreatom radii 46energy parameters 46external van der Waals energy 46hydrogen bond directionality parameters 46hydrogen bond energy, ligand intramolecular 46hydrogen bond energy, protein-ligand 46internal van der Waals energy 46overview of 46parameter file 48, 59parameters, altering 59polarisability parameters 46scaling of external van der Waals energy 46van der Waals energy, ligand 46van der Waals energy, protein-ligand 46

list of, in log file 115

ranking of, for docking solutions 116

torsional parameters 83

User Defined Score 62overview of 62

valence angle term for covalent complexes 33

scoring function limitations when using flexibleside chains 18

Scoring function termsexporting to SILVER 125

in output files, definition 151

saving to output files 111

Scoring function, adding user terms 62

sd format 31

SD-style 151

SD-style tags 151

Select editing panels, Input (check box in front

Page 6: Gold manual

Index 233

disabling 101

FAQs 162

log files 103

PVM (Parallel Virtual Machine) 101

R

Radiusof atom, for use in GoldScore fitness function

46

of binding site 24

ranked_structure... mol2 files 112

Ranking of docked solutions 116

Read hydrophobic fitting points (check box infront end) 5

References describing GOLD 147

Region (hydrophobic) constraints 77

Relative ligand energy 62

Reliability of predictionsas function of number of ligand atoms, first

validation 153

as function of number of ligand atoms,second validation 160

as function of number of ligand H-bondingatoms, first validation 153

as function of number of ligand H-bondingatoms, second validation 160

as function of number of ligand torsions, firstvalidation 153

as function of number of ligand torsions,second validation 160

binding affinity, alpha chymotrypsin 139

binding affinity, FKBP12 140

binding affinity, influenza A neuraminidase138

examples 136

methodology (binding affinity tests) 137

methodology (docking orientation tests) 129

resolution of protein structure 159

root mean square deviations in firstvalidation 154

subjective analysis compared with rmsdeviations 158

validation, first series of experiments 129

validation, second series of experiments 130

REMOVE_HIGH_ENERGY (parameter intorsion angle distribution file) 84

Reordering (message in log file) 119

Rescore log file 117

Rescoring 106

output files 117

overview 106

setting up 106

Rescoring solution file 117

resetting bond types 38

Resolution of protein structure, and predictionaccuracy 159

Rigid ligand docking 66

Rings, varying conformation of 64

rms_analysis 144

rnk file 115

rotamer command in gold.conf 19

limitations 19

rotamer library (standard), using 20

rotamer_lib command block in gold.conf 19

rotamer_library.txt filecommenting out unrequired torsions 20

location 20

using 20

rotamers 18

rotatable_bond_override.mol2 filefixing an angle at its input angle 38

flipping a bond 38

retyping a bond as an amide (am) bond type38

iv Contacting User Support

Page 7: Gold manual

Table of Contents v

Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Overview of the GOLD Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Input Parameters and Files Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Fitness Function Settings Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Genetic Algorithm Parameters Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.5 Parallel Operation Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Setting Up the Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Essential Steps in Setting Up the Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Protein Hydrogen Atoms, Ionisation States and Tautomeric States . . . . . . . . . . . . . . . . . . 10

3.3 Metal Ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3.1 Preparing a Protein Input File which Contains a Metal Ion . . . . . . . . . . . . . . . . . . . . 10

3.3.2 Automatic Determination of Metal Coordination Geometries . . . . . . . . . . . . . . . . . . 11

3.3.3 Specifying Metal Coordination Geometries Manually . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.4 Defining Custom Metal Coordination Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.5 Metal-Ligand Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3.6 Heme Containing Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Water Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.1 Methodology For Handling Waters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.2 Specifying Waters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Rotatable O-H and NH3 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.6 Flexible Side Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.6.1 Introduction to Side-Chain Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.6.2 Specifying a Flexible Side Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.6.3 Using a Standard Rotamer Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.6.4 Allowing a Localised Backbone Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.6.5 Protein-Protein Clashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.6.6 Specifying the Energy of a Side-Chain Rotamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.7 Large Backbone Movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.8 Defining the Binding Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.8.1 Defining a Binding Site from a Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.8.2 Defining a Binding Site from an Atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

232 Index

Processes, maximum number of 105

Program crash, action required in event of 124

Program speedand reliability 97

effect of early termination 93

effect of genetic algorithm parameters 94

number of dockings 93

Proteinactive site definition 24

aspartic acid 10

atom charges 10

atom labels 9

atom types 36

binding site definition 24

bond types 36

cavity detection 28

charges on atoms 10

conformation 18

disulphide bridges 9

file formats 29

file name definition 29

flexibility 18

glutamic acid 10

histidine 10

hydrogen atoms 10

initialised 112

ionisation states 10

metal ions 10

mol2 format 29

pdb format 29

protonation states 10

radius of binding site 24

resolution, correlation with predictionaccuracy 159

selecting, in front end 4

serine 18

setting up 9

tautomeric states 10

threonine 18

water molecules 16

proteindummy atoms 163

lone pairs 163

metal atoms 163

protonation state 163

setting up 163

Protein (entry box in front end) 4

protein backbone movement (large), defining 24

protein backbone movement (localised),defining 20

protein energy term, in GoldScore 23

Protein flexibilityallowing large backbone movement 24

protein-protein clash penalisation, turning off23

protein-protein clashes, penalisation 23

scoring function limitations 18

side chain flexibility 18

specifying allowed rotatable side chains 19

specifying the energy 24

using a standard rotamer library 20

Protein flexiblilityallowing a localised backbone movement 20

Protein H bond constraintsoverview of 74

setting up 75

Protein log file 118

protein-protein clash penalisation, turning off 23

protein-protein clashes, penalising when usingrotatable side chains 23

Protonation statesof ligand 30

of protein residues 10

PVMconsole 102

Page 8: Gold manual

Index 231

disabling 101

FAQs 162

log files 103

Parallel Virtual Machine (PVM) 101

Parameter fileChemScore

editing 58explanation of 58

GoldScoreediting 48explanation of 48

selecting in front end 4

Parameter File (entry box in front end) 4

pdb formatfor ligand 31

for protein 29

problems of defining bond type 31

Peptide linkagesflipping between cis and trans (in ligands) 65

period (directive in torsion angle distributionfile) 85

Phosphateatom type conventions 39

bond type conventions 39

Planar nitrogen, flipping 65

Polar protein hydrogen atomsexplanation 115

saving to file 111

Polarisability, of atom, for use in GoldScorefitness function 46

Population Size (entry box in front end) 7

Population Size (genetic algorithm parameter)default values 96

explanation of 89

relationship to program speed 94

setting value of 96

postprocessing of ligand rotatable bonds,switching off 38

Predictions, accuracy ofas function of number of ligand atoms, first

validation 153

as function of number of ligand atoms,second validation 160

as function of number of ligand H-bondingatoms, first validation 153

as function of number of ligand H-bondingatoms, second validation 160

as function of number of ligand torsions, firstvalidation 153

as function of number of ligand torsions,second validation 160

binding affinity, alpha chymotrypsin 139

binding affinity, FKBP12 140

binding affinity, influenza A neuraminidase138

examples 136

methodology (binding affinity tests) 137

methodology (docking orientation tests) 129

resolution of protein structure 159

root mean square deviations in firstvalidation 154

subjective analysis compared with rmsdeviations 158

validation, first series of experiments 129

validation, second series of experiments 130

Preferences.gold_preferences 127

ChemScorefitness function parameters 58

default genetic algorithm parameter settings96

GoldScorefitness function parameters 59

torsion angle distributions 84

Process file 124

Process scheduler, for parallel operation 104

process_tab 88

vi Table of Contents

3.8.3 Defining a Binding Site from a List of Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.8.4 Defining a Binding Site from a Single Residue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.8.5 Defining a Binding Site from a List of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.8.6 Defining a Binding Site from a Reference Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.8.7 Cavity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.8.8 Output of Cavity Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.9 Protein File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.10 Specifying the Protein File Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Setting Up Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1 Essential Steps in Setting Up a Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Ligand Hydrogen Atoms, Ionisation States and Tautomeric States . . . . . . . . . . . . . . . . . . 30

4.3 Ligand Geometry, Conformation and Stereochemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Ligand File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.5 Specifying the Ligand File(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.6 Setting Up Covalently Bound Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.6.1 Method Used for Docking Covalently Bound Ligands . . . . . . . . . . . . . . . . . . . . . . . 33

4.6.2 Setting Up a Single Covalent Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.6.3 Setting Up Substructure-Based Covalent Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Atom and Bond Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1 Atom and Bond Type Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2 Automatically Setting Atom and Bond Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.3 Manually Setting Atom and Bond Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.4 Overriding Automatic Bond Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.5 Atom and Bond Type Conventions for Difficult Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.6 Internal GOLD Atom Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.1 Choice of Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.2 GoldScore Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.2.1 Docking With Localised Soft Potentials: An Alternative Form for the External Van der Waals Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2.2 Bump Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.3 Altering GoldScore Fitness-Function Parameters; the GoldScore File . . . . . . . . . . . . . . . 48

6.4 ChemScore Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Page 9: Gold manual

Table of Contents vii

6.4.1 Introduction to ChemScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.4.2 Block Functions in ChemScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.4.3 Hydrogen-Bond Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.4.4 Metal-Binding and Lipophilic Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.4.5 Rotatable-Bond Freezing Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.4.6 Clash Penalty and Internal Torsion Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.4.7 Covalent Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.4.8 Constraint Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.5 Altering ChemScore Fitness-Function Parameters; the ChemScore File . . . . . . . . . . . . . . 58

6.6 Altering GOLD Parameters: the gold.params File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.7 Kinase Scoring Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.8 Heme Scoring Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.9 Internal Energy Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.10 User Defined Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7 Ligand Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.1 Flipping Ring Corners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.2 Flipping Amide Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.3 Flipping Planar Nitrogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.4 Flipping Pyramidal Nitrogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.5 Intramolecular Hydrogen Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.6 Protonated Carboxylic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.7 Fixing Rotatable Bonds at Their Input Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

8 Setting and Releasing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

8.1 Using the Constraint Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

8.2 Distance Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8.2.1 Setting Up a Distance Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

8.2.2 Method Used for Substructure-Based Distance Constraints . . . . . . . . . . . . . . . . . . . 71

8.2.3 Setting Up Substructure-Based Distance Constraints . . . . . . . . . . . . . . . . . . . . . . . . 72

8.3 Hydrogen Bond Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

8.3.1 Setting Up Hydrogen Bond Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

8.3.2 Method Used for Protein H Bond Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

8.3.3 Setting up Protein H Bond Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

8.4 Region (Hydrophobic) Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

230 Index

explanation of 91

setting values of 96

Outputcontrolling amount 109

controlling information written to files 111

tag names 151

Output Directory (entry box in front end) 4

Output filesactive_atoms, set in gold_protein.mol2 112

atom type errors 124

best docking solution 115

bestranking.lst 116

cluster analysis 144

comparison of docking solutions 120

directories 112

docked ligand files 112

donor_hydrogens, set in gold_protein.mol2112

energy values 119

error messages 124

fitness function may appear to increase 119

fitness function scores 115

formats same as input files 112

gold.err 124

gold.pid 124

gold_ligand.mol2 112

gold_protein.log 118

gold_protein.mol2 112

gold_solution... mol2 files 118

hydrogen bond energy 119

initialised ligand file 112

initialised protein file 112

ligand log file 118

links, symbolic, between ligand docking files112

log file 118

lone_pairs, set in gold_protein.mol2 112

naming conventions 109

process file 124

protein log file 118

ranked_structure... mol2 files 112

ranking of docked solutions 116

reordering (message in log file) 118

rescore log file 117

rescore.mol2 file 117

rms comparison of docked solution 120

rnk 115

sub-directories 112

symbolic links between ligand docking files112

van der Waals energy 119

overriding ligand bond types 38

Overviewof fitness functions 46

of front end 3

of genetic algorithm 89

of GOLD 1

of torsion angle distributions 83

Oxygen, anionicatom type conventions 39

bond type conventions 39

P

Parallel (check box in front end) 3

Parallel mode of runninghost 101

how it works 101

maximum number of processes 105

multi-processor machines 101

PVM 101

PVM log files 103

selecting and deselecting machines 104

using the console 102

Parallel Operation (panel in front end) 8

Parallel Virtual Machineconsole 102

Page 10: Gold manual

Index 229

processing 101

Mutate (entry box in front end) 7

Mutate (genetic algorithm parameter)default values 96

explanation of 91

setting value of 96

N

N.acid (GOLD internal atom type) 45

N.plc (GOLD internal atom type) 45

N_BINS (parameter in torsion angle distributionfile) 84

Naming conventions for ligand output files 109

NEIGHBOURS (in torsion angle distributionfile) 85

Neuraminidase binding affinity 138

Niche Size (entry box in front end) 7

Niche Size (genetic algorithm parameter)default values 96

explanation of 91

setting value of 96

Niching 7

Nitroatom type conventions 39

bond type conventions 39

Nitrogen, anionicatom type conventions 39

bond type conventions 39

Nitrogen, cationicatom type conventions 39

bond type conventions 39

NODE (in torsion angle distribution file) 85

Non-bonded contacts, allowing short 48

N-oxideatom type conventions 39

bond type conventions 39

Number of Constraints (display box in front end)5

Number of dockingsearly termination 93

effect on program speed 93

setting 93

Number of Islands (entry box in front end) 7

Number of Islands (genetic algorithmparameter)

default values 96

explanation of 90

setting value of 96

Number of ligand atoms, effect on predictionaccuracy

first validation 153

second validation 160

Number of Ligand Bumps (display box in frontend) 5

Number of ligand H-bonding atoms, effect onprediction accuracy

first validation 153

second validation 160

Number of ligand torsions, effect on predictionaccuracy

first validation 153

second validation 160

Number of Ligands (display box in front end) 4

Number of Operations (entry box in front end) 7

Number of Operations (genetic algorithmparameter)

default values 96

explanation of 90

relation to program speed 94

setting value of 96

O

Operator weights in genetic algorithmdefault values 96

viii Table of Contents

8.4.1 Method Used for Region (Hydrophobic) Constraints . . . . . . . . . . . . . . . . . . . . . . . . 77

8.4.2 Setting Up Region (Hydrophobic) Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

8.5 Template Similarity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8.5.1 Method Used for Template Similarity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8.5.2 Setting Up a Template Similarity Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8.6 Scaffold Match Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8.6.1 Method Used for Scaffold Match Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

8.6.2 Setting Up Scaffold Match Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

9 Torsion Angle Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

9.1 Basic Use of Torsion Angle Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

9.2 Choice of Torsion Angle Distribution Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

9.3 Editing Torsion Angle Distribution Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

9.3.1 Format of Torsion Angle Distribution File Header . . . . . . . . . . . . . . . . . . . . . . . . . . 84

9.3.2 Format of Torsion Angle Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

9.3.3 Example Torsion Angle Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9.3.4 Extracting Torsion Angle Distributions from the Cambridge Structural Database . . 88

9.4 Matching Torsion Angle Distributions at Run Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

10 Genetic Algorithm Parameter Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

10.1 Genetic Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

10.2 Population Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

10.3 Selection Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

10.4 Number of Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

10.5 Number of Islands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

10.6 Niche Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

10.7 Operator Weights: Migrate, Mutate, Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

10.8 Van der Waals and Hydrogen Bonding Annealing Parameters . . . . . . . . . . . . . . . . . . . . . 91

10.9 Hydrophobic Fitting Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

11 Balancing Reliability and Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

11.1 Number of Dockings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

11.2 Early Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

11.3 Controlling Reliability and Speed with GA Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

11.3.1 Relationship between GA Parameters and Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

11.3.2 Using Automatic GA Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Page 11: Gold manual

Table of Contents ix

11.3.3 Using Pre-Defined GA Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

11.3.4 Benchmarking of Reliability/Speed for Pre-defined GA Parameter Settings . . . . . . 97

11.3.5 GA Parameter Settings for Virtual Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

12 Running GOLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

12.1 Required Input Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

12.2 Starting GOLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

12.3 Running Interactively; Interactive Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

12.4 Submitting a GOLD job to the Background from the Front End . . . . . . . . . . . . . . . . . . . 100

12.5 Running GOLD from the Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

12.6 Running in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

12.6.1 Parallel Virtual Machine (PVM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

12.6.2 Using the PVM Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

12.6.3 Diagnosis of PVM Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

12.6.4 Selecting and Deselecting Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

12.6.5 Setting the Maximum Number of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

12.6.6 Using GOLD with your own PVM Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

13 Rescoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

13.1 Rescoring Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

13.2 Setting Up a Rescoring Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

14 Output Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

14.1 Controlling the Amount of Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

14.2 Controlling the Information Written to Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

14.3 Specifying Directories for Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

14.4 Files Containing the Initialised Protein and Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

14.5 Files Containing the Docked Ligand(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

14.6 Files Containing Protein Binding-Site Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

14.7 Files Containing Fitness Function Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

14.7.1 File Containing Ranked Fitness Scores for an Individual Ligand . . . . . . . . . . . . . . 115

14.7.2 File Containing Ranked Fitness Scores for a Set of Ligands . . . . . . . . . . . . . . . . . . 116

14.8 Files Containing the Results of Rescoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

14.8.1 Rescore Solution File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

14.8.2 Rescore Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

14.9 Protein Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

228 Index

setting up 30

starting geometry 30

stereochemistry 31

tautomeric states 30

valence angles 33

Ligand Editor (window in front end) 32

Ligand energy 62

Ligand energy correction 62

Ligand input files 4

formats 31

multiple ligands 32

Ligand internal torsional strain, in ChemScore56

Ligand log file 118

best docking solution 115

cluster analysis 120

comparison of docking solutions 120

energy values 119

fitness function may appear to increase 119

fitness function scores 115

reordering (message in log file) 119

rms comparison of docked solutions 120

Ligand output filesbest docking solution 115

directories 112

docked ligand files 112

formats same as input files 112

gold_ligand.mol2 112

gold_solution... mol2 files 118

initialised ligand file 112

links, symbolic, between ligand docking files112

naming conventions 109

ranked_structure... mol2 files 112

rnk 115

sub-directories 112

symbolic links between ligand docking files112

limitations (scoring function) when usingflexible side chains 18

LINKAGE (in torsion angle distribution file) 85

Links, symbolic, between ligand docking files112

Lipophilic term, in ChemScore 54

Literature references describing GOLD 147

Log fileligand 118

protein 118

lone_pairs (set in gold_protein.mol2) 112

M

Maximum number of distributed processes(entry box in front end) 8

Maximum number of processes, setting 105

Metal ionscustom coordination geometries 14

determination of coordination geometries 11

preparation of input files 10

specifying coordination geometries 12

Metal ligand interactions 15

Metal-binding term, in ChemScore 54

Migrate (entry box in front end) 7

Migrate (genetic algorithm parameter)default values 96

explanation of 91

setting value of 96

mimumba.tordist 83

mol format 31

mol2 formatfor ligands 31

for multiple ligands 32

for protein 29

Multiple ligands, docking of 116

Multi-processor machines, use in parallel

Page 12: Gold manual

Index 227

I

Identify ligand, utility 145

improper torsions, defining 20

Influenza A neuraminidase binding affinity 138

Initial geometry 31

Initialised ligand 112

Initialised protein 112

Input files 99

Input Parameters and Files (panel in front end) 4

Interactive userun-time diagnostics 100

Internal energy of ligand 62

Internal H-Bonds (menu item in front end) 66

Internal ligand energy offset 62

Internal van der Waals energy (Goldscore) 46

Interrupt GA (button in GOLD Output window)100

Intramolecular hydrogen bonds in ligandswitching on and off 66

Introductionto fitness functions 46

to front end 3

to genetic algorithm 89

to GOLD 1

to torsion angle distributions 83

Ionisation statesof ligand 30

of protein residues 10

K

kinase scoring function (ChemScore), using 59

L

Lennard-Jones potentials, using localised softpotentials 47

Library screening 98

Library screening settings (menu item in frontend) 98

LigandAdd Ligand (window in front end) 32

Add/Delete Ligand (button in front end) 4

atom charges 30

atom types 36

bond angles 31

bond lengths 31

bond types 36

bond types, specifying in pdb files 31

charges, atomic 30

chiral 31

conformation 31

diastereomers 31

enantiomers 31

file formats 31

file name definition 32

flexibility 64

geometry 31

hydrogen atoms 30

initialised 112

input files 31

ionisation states 30

Ligand Editor (window in front end) 32

mol format 31

mol2 format 31

output files 112

pdb format 31

prediction accuracy, as function of number ofatoms 153

prediction accuracy, as function of number ofH-bonding atoms 153

prediction accuracy, as function of number oftorsions 153

protonation states 30

rings 64

sd format 31

selecting, in front end 4

x Table of Contents

14.10Ligand Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

14.10.1Information on the Progress of Docking Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

14.10.2Comparison of Docking Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

14.10.3Identification of Different Binding Modes (Clustering of Ligand Poses) . . . . . . . . 122

14.11File Containing Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

14.12Process File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

14.13Viewing Docked Solutions in SILVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

14.14Exporting Fitness-Function Data to SILVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

15 Saving and Reusing Program Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

15.1 Saving and Re-using Program Settings in Configuration Files . . . . . . . . . . . . . . . . . . . . 126

15.2 Customising Fitness Function Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

15.3 Customising the Torsion Angle Distribution File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

15.4 Creating Customised Default Genetic Algorithm Parameter Settings . . . . . . . . . . . . . . . 127

16 Accuracy of Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

16.1 Correlation between Predicted and Observed Ligand Positions . . . . . . . . . . . . . . . . . . . . 129

16.1.1 Initial Validation of Docking Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

16.1.2 Follow-Up Validation of Docking Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

16.1.3 Validation using the CCDC/Astex Test Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

16.1.4 Examples of GOLD Dockings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

16.2 Correlation between Fitness Function and Biological Activity . . . . . . . . . . . . . . . . . . . . 137

16.2.1 Prediction of Binding Affinity to Influenza A Neuraminidase . . . . . . . . . . . . . . . . 138

16.2.2 Prediction of Binding Affinity to Alpha Chymotrypsin . . . . . . . . . . . . . . . . . . . . . . 139

16.2.3 Prediction of Binding Affinity to FKBP12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

17 Context-Dependent Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

18 Utility Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

18.1 grommitt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

18.2 smart_rms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

18.3 rms_analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

18.4 identify_ligand.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

19 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

20 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

21 Appendix A: List of Atom and Bond Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Page 13: Gold manual

Table of Contents xi

22 Appendix B: Additional Tags in Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

23 Appendix C: GOLD Predictions in First Series of Validation Tests . . . . . . . . . . . . . . . . . 153

24 Appendix D: GOLD Predictions in Second Series of Validation Tests . . . . . . . . . . . . . . . 160

25 Appendix E: GOLD Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

26 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

226 Index

gold_ligand.mol2 112

gold_protein.log 118

gold_protein.mol2 112

gold_solution... mol2 files 118

GoldScoreatom radii 46

energy parameters 46

heme scoring function, using 48

hydrogen bond energy, ligand intramolecular46

hydrogen bond energy, protein-ligand 46

internal van der Waals energy 46

overview of 46

parameter file 48, 59

parameters, altering 59

polarisability parameters 46

rescoring with 106

scaling of external van der Waals energy 46

torsional energy of ligand 46

van der Waals energy, ligand 46

van der Waals energy, protein-ligand 46

GoldScore (check box in front end) 5

GoldScore fitness terms, in output files 151

goldscore.p450_csd.params file 60

goldscore.p450_pdb.params file 60

goldscore.params file 48

grommitt 142

Guanidiniumatom type conventions 39

bond type conventions 39

H

Help, context sensitive 141

heme scoring functiondifferent parameter files 60

making planar heme N atoms lipophilic

(ChemScore) 60

using 60

using CSD data 60

using PDB data 60

heme-containing proteins, how to set up 15

Histidine, defining ionisation and tautomericstate of 10

Host file name (button in process scheduler) 104

Host machines, selecting 104

Hydrogen atomsconstraining to form hydrogen bonds 66

histidine 10

ionisable groups 10

ligand 30

necessity of including 10, 30

protein 10

serine hydroxyls 10

threonine hydroxyls 10

hydrogen bond directionality parameters 46

Hydrogen bond energyannealing of 91

annealing parameters 91

directionality parameters (Goldscore) 46

ligand intramolecular (Goldscore) 46

ligand intramolecular, switching on and off66

listed in ligand log file 119

parameters 91

protein-ligand (Goldscore) 46

Hydrogen Bonding (entry box in front end) 5

Hydrogen bonding annealing parameterexplanation of 91

setting 5

Hydrogen-bond terms, in ChemScore 52

Hydrophobic fitting pointsexplanation of 92

setting value of 92

Page 14: Gold manual

Index 225

introduction to 18

overview 18

protein-protein clash penalisation, turning off23

protein-protein clashes, penalisation 23

rotamer command in gold.conf 19

rotamer command limitations 19

rotamer_lib block in gold.conf 19

scoring function limitations 18

specifying 19

specifying the energy 24

using a standard rotamer library 20

flexible side chain conformations, defining 19

Flip Amide Bonds (menu item in front end) 64

Flip Planar N (menu item in front end) 65

Flip Protonated Carboxylic Acid (menu item infront end) 66

Flip Pyramidal N (menu item in front end) 66

Flip Ring Corners (menu item in front end) 64

flipping a bond during docking using therotatable bond override file 38

Formatsof ligand files 31

of output files 112

of protein files 29

problems with pdb 31

FRAGMENT (in torsion angle distribution file)85

Front end, overview 3

G

GA (check box in front end) 3

Genetic algorithmand accuracy 129

and speed 94

annealing parameters 91

automatic determination of optimal settings94

basic description 89

benchmarking of default parameter sets 97

chromosome 89

crossover 91

customising default settings 127

FINAL_VIRTUAL_PT_MATCH_MAX 91

FINISH_VDW_LINEAR_CUTOFF 91

hydrogen bonding, annealing parameter 91

hydrophobic fitting points 92

library screening parameters 98

migrate 91

mutate 91

niche size 91

number of islands 90

number of operations 90

operator weights 91

overview 89

parameter settings for virtual screening 98

population size 89

prediction accuracy 129

selection pressure 90

setting parameters 96

van der Waals, annealing parameter 91

virtual screening 98

Genetic Algorithm Parameters (panel in frontend) 7

Geometry, startingof ligand 30

of protein 18

of protein hydroxyl groups 18

Glutamic acid, defining ionisation state of 10

gold.conf 126

gold.err 124

gold.params 59

gold.pid 124

gold.tordist 83

gold.tordist.new 83

xii Table of Contents

Page 15: Gold manual

GOLD User Guide 1

GOLD User Guide

1. Introduction (see page 1)2. Overview of the GOLD Front End (see page 3)3. Setting Up the Protein (see page 9)4. Setting Up Ligands (see page 30)5. Atom and Bond Types (see page 36)6. Fitness Functions (see page 46)7. Ligand Flexibility (see page 64)8. Setting and Releasing Constraints (see page 68)9. Torsion Angle Distributions (see page 83)10. Genetic Algorithm Parameter Definitions (see page 89)11. Balancing Reliability and Speed (see page 93)12. Running GOLD (see page 99)13. Rescoring (see page 106)14. Output Options (see page 109)15. Saving and Reusing Program Settings (see page 126)16. Accuracy of Predictions (see page 129)17. Context-Dependent Help (see page 141)18. Utility Programs (see page 142)19. References (see page 147)20. Acknowledgments (see page 148)

1. Introduction• GOLD (Genetic Optimisation for Ligand Docking) is a genetic algorithm for docking flexible

ligands into protein binding sites.

• A version of SILVER is supplied with GOLD. SILVER has two purposes, first, it serves as a browser for visualising protein-ligand dockings from GOLD. Secondly, it allows you to define and calculate a wide variety of descriptors (parameters that describe dockings) which may be used to analyse the results of a docking run. For further information refer to the SILVER User Guide.

• GOLD provides all the functionality required for docking ligands into protein binding sites from prepared input files ((see Section 3.1, page 9) and (see Section 4.1, page 30)). GOLD will likely be used in conjunction with a modelling program since you will be required to create and edit starting models, e.g. add all hydrogen atoms, including those necessary for defining the correct ionisation and tautomeric states of the residues. Commonly used molecular modelling environments include:

• SYBYL (http://www.tripos.com/)

• Insight II or Cerius2 (http://www.accelrys.com/).

• Predicting how a small molecule will bind to a protein is difficult, and no program can guarantee

224 Index

flipping of planar nitrogen 65

flipping of protonated carboxylic acids 66

flipping of ring corners 64

flipping on pyramidal nitrogen 66

intramolecular hydrogen bonds 66

Fitness Flags (button in front end) 5

Fitness functionangle bending term for covalent complexes

33

apparent increase in during genetic algorithmrun 119

bond angle term for covalent complexes 33

ChemScore 49block functions 50clash penalty 56constraint terms 58covalent term 58explanation of hydrogen-bond terms 52hydrogen-bond terms 52ligand torsional strain 56lipophilic term 54metal-binding term 54overview of 49parameter file 58parameters, altering 58rotatable-bond freezing term 56

choice of GoldScore, ChemScore, UserDefined Score 46

correlation with binding affinity 137

customising parameters 127

GoldScore 46atom radii 46bump checking 48energy parameters 46external van der Waals energy 46hydrogen bond directionality parameters 46hydrogen bond energy, ligand intramolecular 46hydrogen bond energy, protein-ligand 46internal van der Waals energy 46overview of 46parameter file 48parameters, altering 48polarisability parameters 46scaling of external van der Waals energy 46

torsional energy of ligand 46van der Waals energy, ligand 46van der Waals energy, protein-ligand 46

list of, in log file 115

ranking of, for docking solutions 116

torsional parameters 83

User Defined Score 62overview of 62

valence angle term for covalent complexes 33

Fitness Function (check box in front end) 3

Fitness Function Settings (panel in front end) 5

Fitness function, limitations when using flexibleside chains 18

Fitness termsChemscore, definition 151

exporting to SILVER 125

Goldscore, definition 151

in output files, definition 151

saving to output files 111

fixing a bond at its input angle using therotatable bond override file 38

Fixing rotatable bonds at input conformation viathe gold.conf 66

FKBP12 binding affinity 140

Flexibility, treatment offor ligands 89

for protein hydroxyl groups 89

for proteins 18

for rings 64

flexible groups 163

dummy atom 163

set as rigid 163

Flexible protein side chainschi command in gold.conf 19

chi command limitations 19

commenting out unrequired rotamer lines 20

defining torsion tolerances 19

defining torsions 19

Page 16: Gold manual

Index 223

valence angle term for covalent complexes 33

Energy valuesdocking solutions ranked by 119

listed in log file 119

Enolateatom type conventions 39

bond type conventions 39

Error messagesatom typing 124

during interactive use 124

gold.err 124

Examples of docking results 136

Exit (button in front end) 3

External van der Waals energy 91

F

FAQs 162

File formatsfor ligands 31

for output files 112

for proteins 29

problems with pdb 31

File namesconventions for ligand output files 112

specifying for ligand 32

specifying for protein 29

File, configuration 126

Files, input 4

Files, outputactive_atoms, set in gold_protein.mol2 112

atom type errors 124

best docking solution 115

bestranking.lst 116

cluster analysis 144

comparison of docking solutions 120

directories 112

docked ligand files 112

donor_hydrogens, set in gold_protein.mol2112

energy values 119

error messages 124

fitness function may appear to increase 119

fitness function scores 115

formats same as input files 112

gold.err 124

gold.pid 124

gold_ligand.mol2 112

gold_protein.log 118

gold_protein.mol2 112

gold_solution... mol2 files 118

hydrogen bond energy 119

initialised ligand file 112

initialised protein file 112

ligand log file 118

links, symbolic, between ligand docking files112

log file 118

lone_pairs, set in gold_protein.mol2 112

naming conventions 109

process file 124

protein log file 118

ranked_structure... mol2 files 112

ranking of docked solutions 116

reordering (message in log file) 119

rms comparison of docked solutions 120

rnk 115

sub-directories 112

symbolic links between ligand docking files112

van der Waals energy 119

FINAL_VIRTUAL_PT_MATCH_MAX 91

FINISH_VDW_LINEAR_CUTOFF 91

Fit point file (button in front end) 92

Fitness flagsflipping of amide bonds 64

2 GOLD User Guide

success. The next best thing is to measure as accurately as possible the reliability of the program, i.e. the chance that it will make a successful prediction in a given instance. For that reason, GOLD has been tested on a large number of complexes extracted from the Protein Data Bank (see Section 16.1, page 129). The overall conclusion of these tests was that the top-ranked GOLD solution was correct in 70-80% of cases.

• GOLD offers a choice of scoring functions, GoldScore (see Section 6.2, page 46), ChemScore (see Section 6.4, page 49) and User Defined Score which allows users to modify an existing function or implement their own scoring function (see Section 6.10, page 62). With respect to using the GoldScore or ChemScore functions one may give a successful prediction where the other fails, but their overall success rates are about the same (see Section 16., page 129).

• Different values of the genetic algorithm parameters may be used to control the balance between the speed of GOLD and the reliability of its predictions (see Section 11., page 93). GOLD will only produce reliable results if it is used properly and correct atom typing for both protein and ligand is particularly important (see Section 5., page 36).

• GOLD may be used in serial or parallel modes (see Section 12.6, page 101).

Page 17: Gold manual

GOLD User Guide 3

2. Overview of the GOLD Front End• The GOLD front end consists of five panels, not all of which may necessarily be on display at

the same time. These are:

• Control panel (see Section 2.1, page 3)

• Input Parameters and Files panel (see Section 2.2, page 4)

• Fitness Function Settings panel (see Section 2.3, page 5)

• Genetic Algorithm Parameters panel (see Section 2.4, page 7)

• Parallel Operation panel (see Section 2.5, page 8)

2.1 Control Panel

• The Control panel of the GOLD front end contains the following buttons, entry boxes and check boxes:

• Run: Starts an interactive GOLD job.

• Settings: Offers a choice of genetic algorithm parameter settings (see Section 11.3.3, page 96).

• Save&Exit: Saves the current parameter settings in a configuration file for later use, and closes the front end (see Section 15.1, page 126).

• Submit&Exit: Starts a GOLD run in the background (and also saves the parameter settings as a configuration file), then closes the front end.

• Exit: Closes the front end without saving the current parameter settings.

• Configuration File: Reads parameter settings from a previously saved configuration file and loads the parameter values into the front end. The name of the required configuration file must be typed into the entry box.

• Help: Brings up help documentation.

• Select editing panels:

• Input: Switches on and off the display of the Input Parameters and Files panel (see Section2.2, page 4).

• Fitness Function: Switches on and off the display of the Fitness Function Settings panel (seeSection 2.3, page 5).

• GA: Switches on and off the display of the Genetic Algorithm Parameters panel (see Section

222 Index

ring conformations 64

Context sensitive help 141

Control panel 3

Correction term, ligand energy 62

Covalent (check box in front end) 4

Covalent constraintsangle-bending term in 33

method used 33

overview 33

Covalent substructure-based constraints, settingup 34

Covalent term, in ChemScore 58

Crash, action required in event of 124

Create output sub-directories (check box in frontend) 4

Crossover (entry box in front end) 7

Crossover (genetic algorithm parameter)default values 96

explanation of 91

setting value of 96

Customisingdefault genetic algorithm parameter settings

127

fitness function parameter file 127

torsion angle distribution file 127

D

Default (button in front end) 3

Default settings, of genetic algorithm parameters96

Define active site from (buttons in front end) 4

DELTA_E (parameter in torsion angledistribution file) 84

Detect Cavity (check box in front end) 4

Diastereomers 31

DIRECTIVE (in torsion angle distribution file)

85

Directoryfor input 32

for output 112

output sub-directories 112

Display/Output Options (button in front end) 4

Distributed processes, setting maximum numberof 105

Distributions File (button in front end) 5

Disulphide bridges 9

Docking solutionsexamples 136

geometrical comparison 120

ranking of 116

donor_hydrogens (set in gold_protein.mol2) 112

E

Edit Constraints (button in front end) 5

Edit Distributions (button in front end) 5

Edit Parameters (button in front end) 4

Enantiomers 31

energy (rotatable side chain), specifying 24

Energy parametersangle bending term for covalent complexes

33

bond angle term for covalent complexes 33

ChemScoreparameter file 58parameters, altering 58

GoldScore 46altering 59atom radii 46overview of 46parameter file 59polarisability parameters 46scaling of external van der Waals energy 46torsional 46van der Waals 46

hydrogen bond 73

Page 18: Gold manual

Index 221

using for GOLD validation 131

C-H...O interactions, accounting for 59

Charges, atomicfor ligand 30

for protein 10

ChemScoreblock functions 50

clash penalty 56

constraint terms 58

covalent term 58

explanation of hydrogen-bond terms 52

heme scoring functionmaking heme N atoms lipophilic 60using 60

hydrogen-bond terms 52

kinase scoring function, using 59

ligand torsional strain 56

lipophilic term 54

metal binding term 54

parameter file 58

parameters, altering 58

rescoring with 106

rotatable-bond freezing term 56

weak CH...O bonding term 59

ChemScore (check box in front end) 5

ChemScore fitness terms, in output files 151

chemscore.p450_csd.params file 60

chemscore.p450_pdb.params file 60

chi command in gold.conf 19

limitations 19

Chiral ligands 31

Choose machines (entry box in front end) 8

Chromosome 89

Clash penalty, in ChemScore 56

Cluster analysiscalculation with rms_analysis 144

in ligand log file 120

Command line, running GOLD from 100

Comparison of docking solutions 120

Conditions of use i

Configuration filecreating with front end 3

description 126

use in command-line mode 100

Configuration File (entry box in front end) 3

Conformationof ligand 31

of protein 18

of protein hydroxyl groups 18

of rings 64

Consensus scoring 106

Constraint editor 68

Constraint terms, in ChemScore 58

Constraintscovalent, overview 33

distance 69

Fixing rotatable bonds via the gold.conf 66

hydrogen bonds, forcing between protein andligand 73

region (hydrophobic) 77

scaffold match constraint, overview 80

scaffold match constraint, setting up 81

scaffold match, method 81

scaffold match, setting up 81

substructure-based covalent, setting up 34

substructure-based, setting up 72

template similarity, overview 79

template similarity, setting up 79

Constraints, relaxingamide conformations 64

hydrogen bonds, ligand intramolecular 66

planar nitrogens 65

Protonated carboxylic acid conformations 66

pyramidal nitrogen conformations 66

4 GOLD User Guide

2.4, page 7).

• Parallel: Switches on and off the display of the Parallel Operation panel (see Section 2.5,page 8).

2.2 Input Parameters and Files Panel

• The Input Parameters and Files panel contains the following buttons, entry boxes, check boxes, etc.:

• Protein: Allows specification of the protein input file (see Section 3.10, page 29).

• Edit Ligand File List: Allows selection of input ligand file(s) (see Section 4.5, page 32).

• Waters: Specification of water molecules. GOLD allows waters to switch on and off (i.e. to be bound or displaced) and to rotate around (to optimise hydrogen bonding) during docking (see Section 3.4, page 16).

• Metals: Allows specification of metal coordination geometries (see Section 3.3, page 10).

• Set atom types: Controls whether atom types will be set manually or automatically for (a) the ligand(s) and (b) the protein (see Section 5., page 36).

• Allow early termination: If switched on, instructs GOLD to terminate docking on a given ligand if a user-specified criterion is met (see Section 11.2, page 93). The criterion will be that the n top-ranked answers obtained so far are within x Å rms deviation of one another, where n and x are user-defined quantities.

• Define active site from: Allows specification of the position of the binding site with respect to a point, a protein atom close to the centre of the site, a set of protein atoms lining the site, or a reference ligand (see Section 3.8, page 24).

• Active site radius: Allows specification of the radius of the binding site, in Å (see Section 3.8, page 24).

• Detect Cavity: Switches cavity detection on and off (if switched on, the calculation will be confined to concave regions in the vicinity of the binding-site) (see Section 3.8.7, page 28).

• Covalent: Allows specification of a protein-ligand covalent bond (see Section 4.6, page 33).

Page 19: Gold manual

GOLD User Guide 5

• Display: Allows docking solutions to be viewed in SILVER visualiser (see Section 14.13, page 124).

• Output: Provides control over the amount, format and directory structure of GOLD output (see Section 14., page 109).

• Edit Parameters: Copies the default parameter file to a user area so that, e.g., GoldScore fitness-function parameters and other GOLD settings can be modified (see Section 6.3, page 48).

• Parameter File: Specifies which parameter file will be used; this contains parameters used by the GoldScore fitness function together with parameters that control the general operation of GOLD (see Section 6.6, page 59).

2.3 Fitness Function Settings Panel

• The Fitness Function and Search Settings panel contains the following buttons, entry boxes, check boxes, etc.

• GoldScore, ChemScore, User Defined Score: Provides control over which fitness function is to be used (see Section 6., page 46). The appearance of the rest of the panel will depend on which function is selected.

• Appearance if GoldScore selected:

• Appearance if ChemScore selected:

220 Index

API 62

Aromatic bond type 37

Aromatic nitrogenatom type conventions 39

bond type conventions 39

Aspartic acid, defining ionisation state of 10

Astex/CCDC validation test setusing for GOLD validation 131

Atom chargesfor ligand 30

for protein 10

Atom polarisabilities, for use in GoldScorefitness function 46

Atom radii, for use in GoldScore fitness function46

Atom typesautomatic assignment 36

errors, reporting of 36

manual assignment 37

ATOM_DEF (in torsion angle distribution file)85

automatic bond settings, overriding 38

Automatic GA parameter settings 94

B

backbone movement (large), dealing with 24

backbone movement (localised), allowing 20

Background, submitting GOLD job to 100

Basic group, defining ionisation state ofin ligand 30

in protein 10

Best docking solution 115

bestranking.lst 116

Binding affinityalpha chymotrypsin 139

correlation with fitness function 137

FKBP12 140

influenza A neuraminidase 138

Binding sitecavity detection 28

defining from a point 25

defining from a set of atoms 26

defining from an atom 25

radius of 24

Biological activityalpha chymotrypsin 139

correlation with fitness function 137

FKBP12 140

influenza A neuraminidase 138

Block functions, in ChemScore 50

Bond angle, bending energy term for covalentcomplexes 33

Bond angles 31

Bond lengths 31

Bond typesamides 37

aromatic 36

of difficult groups 39

specifying in pdb files 31

bond types (ligand), overriding 38

Bump checking 48

C

Cambridge Structural Database, extractingtorsion angle distributions from 83

Carboxylateatom type conventions 39

bond type conventions 39

Cationic nitrogenatom type conventions 39

bond type conventions 39

Cavity detection 28

CCDC/Astex validation test set

Page 20: Gold manual

Index 219

Index

Numerics

3D visualisation with grommitt 142

A

account for topology (check box in front end) 69

Accuracy of predictions 129

as function of number of ligand atoms, firstvalidation 153

as function of number of ligand atoms,second validation 160

as function of number of ligand H-bondingatoms, first validation 153

as function of number of ligand H-bondingatoms, second validation 160

as function of number of ligand torsions, firstvalidation 153

as function of number of ligand torsions,second validation 160

binding affinity, alpha chymotrypsin 139

binding affinity, FKBP12 140

binding affinity, influenza A neuraminidase138

examples 136

methodology (binding affinity tests) 137

methodology (docking orientation tests) 129

resolution of protein structure 159

root mean square deviations in firstvalidation 154

subjective analysis compared with RMSdeviations 158

validation, first series of experiments 129

validation, second series of experiments 130

Acidic group, defining ionisation state ofin ligand 30

in protein 10

Acknowledgements 148

Active site

cavity detection 28

defining from a point 25

defining from a reference ligand 28

defining from a residue 26

defining from a set of atoms 26

defining from a set of residues 27

defining from an atom 25

radius of 24

Active site radius (entry box in front end) 4

active_atoms (set in gold_protein.mol2) 112

Activity, biologicalcorrelation with fitness function 137

Add Ligand (window in front end) 32

Add/Delete Ligand (button in front end) 4

Allow early termination (check box in front end)4

Alpha chymotrypsin binding affinity 139

amide bond, retyping to using the rotatable bondoverride file 38

Amide linkagesbond type 36

conformation around 31

flipping between cis and trans (in ligands) 64

Amidiniumatom type conventions 39

bond type conventions 39

Anionic nitrogenatom type conventions 39

bond type conventions 39

Anionic oxygenatom type conventions 39

bond type conventions 39

Annealing parameters 91

FINAL_VIRTUAL_PT_MATCH_MAX 91

FINISH_VDW_LINEAR_CUTOFF 91

hydrogen bonding 91

van der Waals 91

6 GOLD User Guide

• Appearance if User Defined Score selected:

• Rescore: Used to rescore a docked ligand pose with an alternative scoring function (see Section 13., page 106).

• Fitness and Search Options: Used to control:

• Ligand flexibility during docking, including: whether ligand ring conformations are varied,whether torsion angles around ligand amide bonds and bonds to trigonal nitrogen are allowedto vary during docking, whether intramolecular hydrogen bonds are permitted between ligandatoms, and whether protonated carboxylic acids are permitted to rotate or flip (see Section 7.,page 64).

• The use of an internal energy offset. This will offset the internal energy of the ligand (internaltorsion, van der waals and hydrogen bonding terms, if applicable) by the best internal energyfound. i.e., when enabled, the internal energy will be taken relative to a near optimal referencestate. This allows any internal energy that is implicit in the structure, i.e. cannot be removedby a change in conformation, to be ignored (see Section 6.9, page 62).

• The use of torsional distributions. These can be used by GOLD to restrict ligandconformational searches to regions of torsion-angle space that are observed in small-moleculecrystal structures (see Section 9., page 83).

• Use of hydrophobic fitting points. This allows specification of a fit point file, i.e. a file ofcustomised hydrophobic fitting points (see Section 10.9, page 92).

• Edit Constraints: Allows specification of distance constraints, hydrogen bond constraints, regional (hydrophobic) constraints, and binding mode similarity constraints (see Section 8., page 68).

• Constraints: Displays the number of constraints currently set.

• Number of Ligand Bumps: Instructs GOLD to allow up to n short protein-ligand contacts, where n is user-specified (see Section 6.2.2, page 48) (not available with ChemScore).

• Van der Waals: Only available if GoldScore selected. Allows specification of the van der Waals annealing parameter (see Section 10.8, page 91).

• Hydrogen Bonding: Only available if GoldScore selected. Allows specification of the hydrogen-

Page 21: Gold manual

GOLD User Guide 7

bonding annealing parameter (see Section 10.8, page 91).

• ChemScore Parameter File: Only available if ChemScore selected. Allows the default file containing ChemScore parameters to be replaced by a user-specified file.

• Scoring Function Shared Object Name (UNIX) or Scoring Function DLL Name (Windows): Only available if User Defined Score selected. Allows selection of user’s own scoring function by specifying a path to a dynamically loadable shared object library.

2.4 Genetic Algorithm Parameters Panel

• The Genetic Algorithm Parameters panel contains the following buttons, entry boxes, check boxes, etc. (see Section 10., page 89):

• Select GA Presets and Automatic Settings: Allows specification of speed and accuracy of docking runs. Either select from a range of preset GA settings, or use automatic settings which will optimise the number of GA operations for each ligand docked (settings will be determined automatically according to the number of rotatable bonds, number of flexible ring corners, size of binding site etc.) (see Section 11.3, page 94).

• Population Size: Allows specification of the population size (i.e. the number of chromosomes that will be used on each island) (see Section 10.2, page 89).

• Selection Pressure: Allows specification of the selection pressure (see Section 10.3, page 90).

• Number of Operations: Allows specification of the total number of operations to be performed in a genetic algorithm run (this is the key determinant of program calculation time) (see Section 10.4, page 90).

• Number of Islands: Allows the genetic algorithm to be split over n islands, where n is user specified (see Section 10.5, page 90).

• Niche Size: Allows specification of the niche size to be used (see Section 10.6, page 91). Niching is a method for trying to keep diversity within the population by avoiding generation of > n very similar chromosomes, where n is user defined. Niching is switched off after 90% of the GA run.

• Migrate/Mutate/Crossover: Controls the relative frequencies with which the three types of genetic operations occur. Migrate should be zero if Number of Islands is one, since it refers to migration of chromosomes from one island to another (see Section 10.7, page 91).Note: You are recommended to use automatic settings, or one of the default parameter sets offered in the GOLD front end (see Section 11.3, page 94).

218 GOLD User Guide

mode still appears in some solutions but these invariably have lower scores.

This ends the tutorial.

Page 22: Gold manual

GOLD User Guide 217

3. Cross-Docking into 1x7r with a Soft Potential applied to Leu 346• View the file gold_1x7r_1l21_SP.conf using a text editor. This file has been set up so

that a soft VdW potential with 2-4 functional form has been applied to one residue only, Leu346. This replaces the default 4-8 functional form that applies to the rest of the protein.

• The Keyword that has been introduced to set the soft potential for Leu346 is at the end of the file and is reproduced below. The numeral in brackets, (1) in this case, indicates that a 2-4 form has been applied. If this number were (2) then a softer 1-2 functional form would have been applied. Further information is available (see Section 6.2.1, page 47).

• Run the docking job gold_1x7r_1l21_SP.conf and analyse the results using SILVER. We recommend you run this job using the command line. Instructions are available on using the command line in Windows or under Unix (see Section 12.5, page 100).

• This time you should find that the highest scoring solutions correspond very closely with the 1l2i binding mode (see below). These solutions will have scores of 43-45. The reversed binding

8 GOLD User Guide

2.5 Parallel Operation Panel

• The Parallel Operation panel contains the following buttons, entry boxes, check boxes, etc.:

• Maximum number of distributed processes: Shows the number of GOLD processes that will run simultaneously. This should normally be set equal to the number of processors available for the GOLD job to run on (see Section 12.6.5, page 105).

• Choose machines: Allows specification of the machines on which the GOLD job is to be run (see Section 12.6.4, page 104).

Note: The Parallel operation Panel is only accessible if PVM has been set-up (see Section 12.6, page 101).

Page 23: Gold manual

GOLD User Guide 9

3. Setting Up the Protein

3.1 Essential Steps in Setting Up the Protein (see page 9)3.2 Protein Hydrogen Atoms, Ionisation States and Tautomeric States (see page 10)3.3 Metal Ions (see page 10)3.4 Water Molecules (see page 16)3.5 Rotatable O-H and NH3 Groups (see page 18)3.6 Flexible Side Chains (see page 18)3.7 Large Backbone Movements (see page 24)3.8 Defining the Binding Site (see page 24)3.9 Protein File Formats (see page 29)3.10 Specifying the Protein File Name (see page 29)

3.1 Essential Steps in Setting Up the Protein

• You can either input the whole protein structure to GOLD, or just those residues that are in the active site region. The latter leads to somewhat shorter run times, since both protein initialisation and cavity detection will be quicker.

• If you input only the region of interest around the binding site, you must ensure that all the residues you include are complete. You should also include all residues within a 5Å radius from the solvent-accessible surface of the cavity.

• Add all hydrogen atoms, including those necessary to define the correct ionisation and tautomeric states of residues such as Asp, Glu and His (see Section 3.2, page 10).

• Ensure that all bond types are correct. If they are, and hydrogen atoms have been placed on the correct atoms, GOLD will deduce atom types automatically (see Section 5.2, page 36). This also applies to PDB input files but only for known residues (i.e there is no HET group library).

• GOLD connects atoms within residues on the basis of proximity. Double bonds are assigned as appropriate for the naturally occurring protein residues,

• Residues should be in sequence order, and correctly named.

• All atoms should be properly labelled (CA, CB etc.).

• Any unusual bonds (disulphide bridges, etc.) should have CONECT records.

• If a metal ion is present, ensure that all bonds between the ion and coordinating protein or water atoms are deleted (GOLD will re-find them automatically). Metals should be within bonding distance of at least two protein and/or water atoms in the active site so that GOLD can infer likely coordination geometries. (see Section 3.3, page 10)

• Save the protein in, e.g., MOL2 format.

• GOLD assigns atom types from the information about element types and bond orders in the input structure file, so it is important that these are correct. However, if for any reason, GOLD is unable to deduce an atom type, then the atom in question will be replaced with a dummy atom type Du. If this is the case a warning message will be given in the gold_protein.log file.

216 GOLD User Guide

• Most of the binding site is well superimposed. However above the ligands you can see that there is movement of a protein loop that brings Leu346 closer in to the ligand in 1x7r than in 1l2i. This superposition suggests that a clash would exist if the ligand from 1l2i were docked into 1x7r. This might prevent the correct binding mode being rated highly if using a scoring function such as GoldScore, with a clash term that increases sharply with proximity to the protein. Other residues such as Met343 also do not superimpose well as a consequence of this loop movement. However these residue shifts appear to have less of an impact on the size of the active site than does that of Leu346.

• You can view this superposition yourself by opening SILVER or another protein visualiser and reading in the file 1x7r_1l2i_sup.mol2 .

2. Cross-Docking into 1x7r with no Soft Potential Applied• The files 1l2i_prot.mol2 and 1x7r_prot.mol2 are the protein models derived from the

pdb entries 1x7r and 1l2i. 1l2i_lig.mol2 is the ligand structure obtained from, and in the same frame of reference as 1l2i.

• The GOLD configuration file gold_1l2i_1l2i.conf is set up to dock the 1l2i ligand back into the 1l2i protein structure. Run this GOLD job and analyse the results in SILVER to check that the crystallographic binding mode is indeed retrieved. Read in the file 1l2i_lig.mol2to make the comparison.

• The GOLD configuration file gold_1x7r_1l21.conf is set up to dock the the 1l2i ligand into the 1x7r protein structure. Run this GOLD job and analyse the results into SILVER. Read in the file 1x7r_1l2i_sup.mol2 to compare the docked poses with the binding mode found in 1l2i. You may find that there are some solutions which have approximately the right binding mode and which return scores of between 23 and 25. However there should also exist higher ranking poses with scores of between 28 and 32. These poses have the ligand rotated through 180 degrees along the long axis as shown in the superposition below (crystallographic binding mode colour coded orange, GOLD docking pose colour coded green).

Page 24: Gold manual

GOLD User Guide 215

Tutorial 7: Docking using Localised Soft Potentials

1. Introduction (see page 215)2. Cross-Docking into 1x7r with no Soft Potential Applied (see page 216)3. Cross-Docking into 1x7r with a Soft Potential applied to Leu 346 (see page 217)

1. Introduction • The object of this tutorial is to demonstrate how to employ the Localised Soft Potential option

that is available when using GoldScore. This option allows you to soften the VdW clash component of the GoldScore for one or more residues in the protein. We will examine the docking of a ligand to two different crystal structures of Estrogen Receptor Alpha. The structures differ in that a small loop movement constrains the binding site of one of the structures (pdb code 1x7r) slightly more than for the other structure (pdb code 1l2i)

• All files referred to in this tutorial can be found in <GOLD_DIR>/examples/tutorial7where <GOLD_DIR> is the location of your GOLD installation.

• The figure below shows the superposition of both protein structures. 1x7r corresponds to the protein colour coded light blue and the ligand colour coded green, 1l2i corresponds to the protein colour coded orange with the ligand colour coded yellow.

10 GOLD User Guide

• The presence of dummy atoms should not significantly affect the docking prediction since dummy atoms are neither considered as donors or acceptors.

3.2 Protein Hydrogen Atoms, Ionisation States and Tautomeric States

• GOLD uses an all-atom model, so the protein must have all hydrogen atoms added.

• The precise geometrical positions of Ser, Thr and Tyr hydroxyl hydrogen atoms or Lys NH3

hydrogen atoms do not matter as their orientation will be optimised during the GOLD run.

• GOLD deduces the hydrogen-bonding abilities of protein residues from the presence or absence of hydrogen atoms. For example, you can control the protonation and tautomeric state of Asp, Glu and His residues by adding or removing appropriate hydrogen atoms.

• If incorrect ionisation or tautomeric states are inferred by the program, it is unlikely that correct protein-ligand binding modes will be predicted. GOLD will not vary tautomeric or ionisation states during docking, if you are unsure about, e.g., the tautomeric state of a His residue, you should perform separate GOLD runs using the different possibilities.

• GOLD ignores atom charges, both formal and partial. It deduces whether an atom is charged by counting the bond orders of the bonds that it forms and comparing the result with the atom’s normal valency.

3.3 Metal Ions

3.3.1 Preparing a Protein Input File which Contains a Metal Ion (see page 10)3.3.2 Automatic Determination of Metal Coordination Geometries (see page 11)3.3.3 Specifying Metal Coordination Geometries Manually (see page 12)3.3.4 Defining Custom Metal Coordination Geometries (see page 14)3.3.5 Metal-Ligand Interactions (see page 15)3.3.6 Heme Containing Proteins (see page 15)

3.3.1 Preparing a Protein Input File which Contains a Metal Ion

• There are some additional requirements when preparing a protein input file which contains a metal ion.

• The metal ion must be coordinated to at least two protein atoms or water molecules so that GOLD can predict the coordination geometry (see Section 3.3.2, page 11).

• In the protein input file, the metal ion should not have any bonds to coordinating atoms. If these are present in the original PDB file, they must be deleted.

• In order to model metal ions within SYBYL, you need to load a parameter file (otherwise, all metal ions will be assigned dummy-atom types). Add the following line to your ~/.sybylrc file:

parameter open $TA_ROOT/demo/metals.tpd

Page 25: Gold manual

GOLD User Guide 11

• There may be problems in the way that SYBYL handles metal ions: they are not always well behaved in the minimiser, and typically have valencies of 4 or 6, which may mean that hydrogen atoms are added to the metal when you add hydrogen atoms to the protein.

• Note: GOLD can only handle the hardcoded metal atom types (see Section 3.3.2, page 11); it is not possible to add user defined metal atom types.

3.3.2 Automatic Determination of Metal Coordination Geometries

• GOLD is able to recognise the following metal coordination geometries:

• In order to determine the coordination geometry of a particular metal atom GOLD performs a permuted superimposition of coordination geometry templates onto the coordinating atoms found in the protein (e.g. if there are only two coordinating atoms in the protein then every unique pair of coordinating template atoms are selected and superimposed on the system in the protein).

• Coordination fitting points are then generated using the template that gives the best fit (based on RMSd).

• The geometry templates used for given metals are defined in the gold.params file in the section headed # Metals (for explanation of parameters refer to comments in the gold.params file):

Template Geometry Coordination number

TETR Tetrahedral n=4

TBP Trigonal bipyramidal n=5

OCT Octahedral n=6

CTP Capped trigonal prism n=7

PBP Pentagonal bipyramidal n=7

SQAP Square prism n=8

ICO Icosahedral n=10

DOD Dodecahedral n=12

214 GOLD User Guide

this means we have to be careful to pick a Gln192 rotamer that is folded away from the binding region but also does not clash with the arginine residue. A way round this is to add the command penalise_protein_clashes = 0 to the rotamer_lib command block (place it anywhere between rotamer_lib and end_rotamer_lib). This will switch off calculation of clashes between flexible side-chain atoms and neighbouring protein atoms, allowing Gln192 to approach nearby residues closely. While physically unrealistic, this is a pragmatic tactic that might well work (and is not as egregious as it sounds, since, in reality, Arg143 can probably move away from Gln192 if it needs to).

• Obviously, you can experiment with these options if you wish.

This ends the tutorial.

Page 26: Gold manual

GOLD User Guide 213

• Also, the best solution from the flexible run has a much higher GoldScore value (75.7161) than was obtained from the rigid run.

• Again, you can view these results in SILVER if you wish. The movements of the flexible side chain Gln192 can be seen more effectively if the show protein hydrogens tick box is deactivated and the Gln192 residue is selected as a protein subset (Descriptors, Define a protein subset, By residue...). The newly defined subset can be selected by picking it from the Subset highlighting pull-down menu in SILVER.

5. Choosing Side-Chain Rotamers• Two decisions must be made when using the flexible side-chain facility: (a) which side chains

are made flexible; (b) how flexible is each side chain made? It is important to recognise that the more flexibility is introduced, the larger the search space becomes. Particularly with high-throughput runs, when relatively little time can be allowed per ligand, this may seriously decrease the chance of finding the global minimum.

• A sensible strategy is therefore to make a side chain flexible only if you have some a priori reason to suppose that it will move, as we have (from X-ray structures) in the tutorial example.

• On the other hand, we probably allowed Gln192 more movement than necessary in the above experiments. As long as it can adopt the native 1fax position and one other position in which it is folded away from the binding site, that might well have been enough.

• One problem is that, in some conformations, Gln192 tends to clash with Arg143. At first sight,

12 GOLD User Guide

• For example, for a Zn atom GOLD will attempt to match coordination geometries 4, 5 and 6 (tetrahedral, trigonal bipyramidal, and octahedral templates) onto the coordinating atoms found in the protein.

• The template that gives the best match will then be used to generate coordination fitting points.

• Details of the coordination geometry determination are given in the gold_protein.log file.

• The output file gold_protein.mol2 will contain a number of dummy atoms representing idealised coordination positions. These dummy atoms will be connected to the metal ion. Any unoccupied coordination points will then be available for ligand binding (see Section 3.3.5, page 15).

3.3.3 Specifying Metal Coordination Geometries Manually

• It is possible to manually specify coordination geometries for particular metal atoms. This can be used to allow non-standard metal coordination geometries, or to limit the number of possible geometries that GOLD checks (i.e. it is possible to overrule the default geometries for the corresponding metal type defined in the gold.params file (see Section 3.3.2, page 11)).

• Click on the Metals button in the Input Parameters and Files section of the GOLD front-end. The Metal Selection window will appear:

H-Bonding

type

Sybyl atom type Atom type (default or

elucidated)

Donor (D), Acceptor

(A), or Metal (M).

Allowed Coordination

geometries

Coordination

distance

MGD Mg DEF M 4, 6 2.05

ZND Zn DEF M 4, 5, 6 2.09

MND Mn DEF M 4, 6 2.06

FED Fe DEF M 4, 6 1.98

CAD Ca DEF M 6, 7 2.44

COBD Co.oh DEF M 6 2.09

GDD Gd DEF M 6 2.44

Page 27: Gold manual

GOLD User Guide 13

• Type in the atom number of the metal (as it appears in the protein input file), then select the allowed coordination geometries from the list.Note: If the list of pre-defined coordination geometries does not contain a suitable geometry then you can define a custom metal coordination geometry (see Section 3.3.4, page 14).

• Once the allowed geometries have been selected for a particular metal atom click on the Add metal or Update selected metal button to add the selection to the Current Metal Settings.

• Repeat the above procedure if you want to specify coordination geometries for additional metal atoms.

• To edit a Current Metal Setting (e.g. to change the allowed coordination geometries) highlight the corresponding entry in the Current Metal Settings list, make the required change and then hit the Add metal or Update selected metal button.

• To remove an entry from the Current Metal Settings highlight the entry and hit the Delete Selection button, or to remove all entries hit the Clear List button.

• Click on Done in the Metal Selection window when you are satisfied with the chosen metals and their allowed coordination geometries. When you finish, the count of Metals will be updated in the GOLD front end.

212 GOLD User Guide

• You can see this for yourself in SILVER by the following sequence of operations:

• Open SILVER.

• Select File followed by Load GOLD run results... and use the file browser to select the filenon_flexible.conf.

• In the SILVER interface, select the ligand docking with the highest GoldScore (scores aregiven at the end of each line in the Ligands list box).

• Switch off the Clear ligands on loading check box (near the bottom right of the SILVERwindow).

• Select File followed by Load a ligand... and pick 1fax_1lpg_super.mol2 from the file browser.

• Switch on the Display multiple ligands check box and then select LIM::IMA_301_pdb1lpg_1(the experimental position of the 1lpg ligand) from the Ligands list (it should be item 14 in thelist). You can now see the discrepancy between the experimental pose and the top-rankedsolution from the non-flexible run.

• Read the flexible.conf into SILVER in the way described above and compare the top-ranked solution with the experimental position of the 1lpg ligand. In contrast, the top-ranked solution from the flexible run is much better. It is not perfect - in particular, the benzamidine moiety is somewhat displaced - but the benzyloxy side chain is now roughly in the right position, the Gln192 side chain having moved out the way:

Page 28: Gold manual

GOLD User Guide 211

first rotamer line specifies a side-chain conformation with chi1 = 62 (plus or minus 13) degrees, chi2 = 180 (plus or minus 14), chi3 = 20 (plus or minus 16).Note: GOLD will round any tolerance values that are not multiples of ten up to the next 10, thus GOLD will process the first line as chi1 = 62 (plus or minus 20) degrees, chi2 = 180 (plus or minus 20), chi3 = 20 (plus or minus 20).

• The next line defines the exact rotamer chi1 = 70, chi2 = -75, chi3 = 0; and so on.

• During docking, Gln192 will be allowed to take up any conformation that falls within any of the rotamer definitions. The rotamers are based on a library of highly-populated side-chain conformations described in S. C. Lovell, J. M. Word, J. S. Richardson & D. C. Richardson, Proteins, 40, 389-408, 2000. A digest of this information in a format suitable for copy-and-paste into GOLD configuration files is available in <GOLD_DIR>/gold/rotamer_library.txt (see Section 3.6.3, page 20).

• The final line, end_rotamer_lib, closes this flexible side-chain definition. Had we wished, we could have added further rotamer_lib command blocks to specify other flexible side chains, up to a maximum of 10.

4. Comparison of Flexible and Non-Flexible Results• If you wish, you can run the two GOLD jobs using the configuration files described in the

preceding section. Alternatively, you can view the results that we have generated. Since GOLD is non-deterministic, any results that you get might differ from ours, but the general trends are likely to be the same.

• The results of our docking runs with rigid and flexible Gln192 side chain are in the directories non_flexible and flexible, respectively.

• As expected, none of the solutions produced in our non-flexible run is correct; all have the benzyloxy side chain seriously misplaced. The top-ranked docking has a GoldScore of 63.8592 and is shown below with the true ligand position for reference:

14 GOLD User Guide

3.3.4 Defining Custom Metal Coordination Geometries

• It is possible to specify custom metal coordination geometries which can subsequently be used to derive ligand binding points around particular metal atoms.

• GOLD will normalise the size of the custom polyhedron to the appropriate metal-chelator distance before matching it to the metal and the coordinating atoms found in the protein.

• Click on the Metals button in the Input Parameters and Files section of the GOLD front-end then, in the Metals Selection window, click on the Set Up Custom Metal Polyhedrons button. The Define Custom Metal Coordination geometries window will appear:

• Custom metal polyhedron may contain up to nine points. Each point in the custom polyhedron must be specified using a vector (assuming the centre of your polyhedron is at the origin).

• For example, to set up a custom square planar geometry you must specify four points using the following vectors:

0, 1, 01, 0, 0-1, 0, 00, -1, 0

Page 29: Gold manual

GOLD User Guide 15

• Assuming the metal is on the origin (0,0,0), GOLD will then attempt to match the specified vectors onto the metal-to-protein-atom vectors found in the protein (vectors are normalised to a metal-to-chelator distance of 2.0 Å).

• Once vectors for each point in the polyhedron have been defined click on the Add metal coordination polyhedron or Update selected polyhedron button to add the custom definition to the Current Metal Polyhedron Settings.

• Repeat the above procedure if you want to specify additional custom polyhedron. It is possible to set up to three custom metal polyhedron.

• To edit a Current Metal Polyhedron Setting highlight the corresponding entry in the Current Metal Polyhedron Settings list, make the required change and then hit the Add metal or Update selected metal button.

• To remove an entry from the Current Metal Polyhedron Settings highlight the entry and hit the Delete Selection button, or to remove all entries hit the Clear List button.

• Click on Done in the Define Custom Metal Coordination geometries window when you are satisfied with the custom coordination geometries that have been defined.

• The count of Custom Metal Polyhedron will be updated and the custom geometries will be available for selection from the Metal Selection window (see Section 3.3.3, page 12).

3.3.5 Metal-Ligand Interactions

• Metal coordination in GOLD is modelled as 'pseudo-hydrogen bonding'.

• Metal-ligand interactions will typically involve the metal binding to, for example, carboxylate ions, deprotonated histidines (i.e. negatively charged), and phenolates. Therefore metals can be considered to bind to H-bond acceptors and the metal will compete with H-bond donors for interaction.

• Consequently, GOLD uses the following approach for handling metals:

• Virtual coordination points are added at locations where GOLD is missing a coordination site.

• These coordination points are then used as fitting points that can bind to acceptors.

3.3.6 Heme Containing Proteins

• The paper Kirton et al, Proteins: Structure, Function, and Bioinformatics, 58, 836-844, 2005 describes the use of ligand specific iron parameters in the context of docking to heme-containing proteins. This extended metal parameterisation is available for the fine-tuning of metal interactions, so that e.g. metal-ligand interactions can specifically be addressed depending on the metal contact.

• The protein does not need to be set up in a special way to make use of these parameters however the standard set-up should be followed (see Section 3.3.1, page 10).

• Further information on setting up a GOLD run with these settings is available (see Section 6.8, page 60).

210 GOLD User Guide

GOLD conventions.

• These two files may be viewed in SILVER if desired.

3. Preparation of Configuration Files• Two GOLD configuration files have been prepared. The first, non_flexible.conf, was set up in

the normal way using the GOLD graphical user interface. It corresponds to a standard docking of the 1lgp ligand into the 1fax binding site, using slow search settings (100,000 GA operations) and allowing no side-chain flexibility. The considerations outlined in the preceding part of this tutorial suggest that this docking protocol is unlikely to give good results.

• The second file, flexible.conf, defines a docking in which the Gln192 side chain is allowed to move. It was set up by editing the original configuration file, non_flexible.conf, in a text editor. Currently, side-chain flexibility is not available via the GOLD graphical interface, you must directly edit additional command lines into the .conf file (see Section 3.6.2, page 19) and run GOLD via the command line.

• Comparing the two configuration files, you will see that the flexible version contains the following additional lines at the end:

rotamer_libname gln_192chi1 2817 2818 2821 2822chi2 2818 2821 2822 2823chi3 2821 2822 2823 2825rotamer 62 (13) 180 (14) 20 (16)rotamer 70 -75 0... several more rotamer lines ...

end_rotamer_lib

• Collectively, these lines define the torsional flexibility that the Gln192 side-chain will be allowed to have during docking.

• The first line specifies that a rotamer_lib command block is beginning.

• The second line specifies a unique name for this rotamer_lib command block - any text can be used, but a useful convention is to use the name of the side chain to which the command block pertains.

• The next three lines specify the torsion angles that are to be made variable. If you open 1fax_protein.mol2 in a text editor, you will see that the atom numbers 2817, 2818, 2821, 2822, 2823 and 2825 correspond to N, CA, CB, CG, CD and NE2, respectively, of Gln192. This means that chi1, chi2 and chi3 correspond, respectively, to rotation around the C -C , C -C and C -C bonds. When defining these torsion angles, you must start from the atom nearest the backbone and move out along the side chain, e.g. chi3 2825 2823 2822 2821 would be invalid.

• The rotamer lines define the allowed values or ranges of values for chi1, chi2 and chi3. Thus, the

Page 30: Gold manual

GOLD User Guide 209

right-hand corner of the plot, Gln192, adopts a variety of positions according to which ligand is bound. The Gln192 position highlighted in purple is taken from 1lpg, that shown in orange is taken from 1fax.

• The next figure was produced by superimposing 1lpg and 1fax. It shows the 1fax binding site and the 1lpg ligand. Gln192 is highlighted in orange. It is immediately clear that the 1lpg ligand cannot be docked accurately into the 1fax binding site if Gln192 is not allowed to move, since there is a severe steric clash between these two.

• To see this more clearly, you can open SILVER and read in the file 1fax_1lpg_super.mol2 from <GOLD_DIR>/examples/tutorial6 via File, Load a ligand; this is the superposition from which the above figure was generated. Superimpose the 1lpg ligand with the 1fax protein fragment by selecting the Display multiple ligands tickbox then clicking on PCN::pdb1fax-A_1 and LIM::IMA_301_pdb1lpg_1 from the Ligands list.

2. Preparation of Input Files• The file 1fax_protein.mol2 contains the binding site from 1fax. It has been set up for docking in

the normal way. Parts of the protein remote from the binding site have been deleted in order to speed up the calculation, and hydrogen atoms have been placed on the protein in order to ensure that ionisation and tautomeric states are defined unambiguously (see Section 3.1, page 9).

• The ligand from 1lpg has also been set up for docking (see Section 4.1, page 30). It is stored in 1lpg_ligand.mol2. Again, attention has been given to protonation states (e.g. the benzamidine group has been built in its protonated form) and the bond types have been set in accordance with

16 GOLD User Guide

3.4 Water Molecules

3.4.1 Methodology For Handling Waters (see page 16)3.4.2 Specifying Waters (see page 16)

3.4.1 Methodology For Handling Waters

• Water molecules often play key roles in protein-ligand recognition. Water molecules can either form mediating hydrogen bonds between protein and ligand, or be displaced by the ligand on binding.

• GOLD allows waters to switch on and off (i.e. to be bound or displaced) and to rotate around around their three principal axes (to optimise hydrogen bonding) during docking.

• To predict whether a specific water molecule should be bound or displaced, GOLD estimates the free-energy change, Gb, associated with transferring a water molecule from the bulk solvent to

its binding site in a protein-ligand complex. Gb for a given water molecule is defined as:

• Gp(W) is a constant penalty added for each water molecule that is switched on and represents

the loss of rigid-body entropy on binding to the target (hence rewarding water displacement).Note: Gp values were optimised against a training set of 58 protein-ligand complexes for four

targets (HIV-1 protease, factor Xa, thymidine kinase and the oligopeptide-binding protein Opp A) where water molecule play key roles in the recognition. Further details can be found in Modeling Water Molecules in Protein-Ligand Docking Using GOLD (see References, page 147).

• Gi(W) represents the intrinsic binding affinity of a water molecule and contains contributions

resulting from interactions that the water forms with the protein and ligand (changes in the interactions between protein and ligand caused by introduction of the water are also accounted for).

• Therefore, for a water molecule to be bound to a protein-ligand complex, its intrinsic binding affinity needs to outweigh the loss of rigid-body entropy on binding.

3.4.2 Specifying Waters

• GOLD allows you to switch specific water molecules on or off (i.e. you can specify whether a particular water should be present or absent in the protein). Alternatively, GOLD can automatically determine whether a specific water should be bound or displaced by toggling it on and off during the docking run. The orientation of the water hydrogen atoms can also be optimised by GOLD during docking.

Gb W Gp W Gi W+=

Page 31: Gold manual

GOLD User Guide 17

• To specify settings for key water molecules, click on the Waters button in the Input Parameters and Files section of the GOLD front-end. The Water Selection window will appear:

• For each water molecule that you want to either include or exclude from the docking, you need to specify:

• The atom number of the water oxygen atom (as defined in the protein input MOL2 file).

• The state of the water, available options are:On: use the water for docking (i.e. present)Off: do not use the water for docking (i.e. absent)Toggle: have GOLD decide whether the water should be present or absent (i.e. bound ordisplaced) during docking.

• The orientation of the water hydrogen atoms, available options are:Freeze: use the orientation specified in the input fileSpin: have GOLD automatically optimise the orientation of the hydrogen atoms.

• Once the allowed state and orientation of a water molecule has been specified click on the Add water or Update selected water button to add the water molecule to the Current Water Settings.

• Repeat the above procedure if you want to specify additional water molecules.

• To edit a Current Water Setting (e.g. to change the state) highlight the corresponding entry in the Current Water Settings list, make the required change and then hit the Add water or Update selected water button.

• To remove an entry from the Current Water Settings highlight the entry and hit the Delete

208 GOLD User Guide

Tutorial 6: Docking with a Flexible Side Chain

1. Introduction (see page 208)2. Preparation of Input Files (see page 209)3. Preparation of Configuration Files (see page 210)4. Comparison of Flexible and Non-Flexible Results (see page 211)5. Choosing Side-Chain Rotamers (see page 213)

1. Introduction• The object of this tutorial is to demonstrate how to dock a ligand into a binding site which is

known to contain a flexible side chain. The example will involve docking the ligand from PDB entry 1lpg into the protein binding site taken from 1fax. These structures are of blood coagulation factor Xa, complexed with two different ligands.

• All files referred to in this tutorial can be found in <GOLD_DIR>/examples/tutorial6where <GOLD_DIR> is the location of your GOLD installation.

• The figure below shows a superposition of several experimental determinations of the factor Xa binding site, complexed with a variety of different ligands (not shown), Only a small part of the binding site is displayed.

• While it is clear that parts of the binding site are rigid, their positions hardly moving from one structure to the next, other parts are more inclined to move. In particular, the residue at the top

Page 32: Gold manual

GOLD User Guide 207

6. Changing the Scoring function• You may wish to stop the tutorial here. However, optionally, you can run through the tutorial

again, this time having the GoldScore option set in the Fitness Function and Search Settings box.

• You will find that similar results are obtained. When all waters are turned off, two binding modes are generally found that score similarly well. One of these binding modes actually superimposes the reference ligand very well, and allowing waters to toggle does not significantly improved the superimposition in this case. However, allowing the waters to toggle does result in only this one binding mode being returned. The second, spurious binding mode is successfully eliminated.

This ends the tutorial.

18 GOLD User Guide

Selection button, or to remove all entries hit the Clear List button.

• Click on Done in the Water Selection window when you are satisfied with the waters specified and their allowed state and orientations. When you finish, the count of Waters will be updated in the GOLD front end.

• Any unspecified waters that are part of the protein are considered to be On, automatically.

3.5 Rotatable O-H and NH3 Groups

• The torsion angles of Ser, Thr and Tyr hydroxyl groups will be optimised by GOLD so their starting positions do not matter. Specifically, each Ser, Thr and Tyr OH will be allowed to rotate to optimise its hydrogen-bonding to the ligand, unless it is held in place by strong H-bonds to neighbouring protein residues. Lysine NH3+ groups are similarly optimised.

3.6 Flexible Side Chains

3.6.1 Introduction to Side-Chain Flexibility (see page 18)3.6.2 Specifying a Flexible Side Chain (see page 19)3.6.3 Using a Standard Rotamer Library (see page 20)3.6.4 Allowing a Localised Backbone Movement (see page 20)3.6.5 Protein-Protein Clashes (see page 23)3.6.6 Specifying the Energy of a Side-Chain Rotamer (see page 24)

3.6.1 Introduction to Side-Chain Flexibility

• You may specify that one or more protein side chains are to be treated as flexible. Each flexible side chain will be allowed to undergo torsional rotation around one or more of its acyclic bonds.

• This option is only available if you are using the GoldScore scoring function (see Section 6.2, page 46).

• Making a side chain flexible can make docking more difficult because it increases the search space that must be explored. It may also increase the chance of false positives (i.e. ligands that appear to dock well but do not actually bind). Therefore, you should only make a side chain flexible if you have good reason to believe (e.g. from X-ray data) that it is likely to move in response to ligand binding.

• At present, side chains may only be made flexible by directly editing the GOLD configuration file (see Section 15.1, page 126), i.e. the option is not available via the GOLD graphical user interface. Therefore, you need to set up the GOLD job as normal in the graphical interface, hit Save & Exit to save the configuration file (which, by default, is named gold.conf), then manually edit this file as detailed below.

Page 33: Gold manual

GOLD User Guide 19

3.6.2 Specifying a Flexible Side Chain

• For each side chain that you want to make flexible, you should add a rotamer_lib block of commands to the end of the gold.conf file. This specifies the name of the side chain, the torsion angles that are permitted to vary, and the allowed values or ranges of values for those torsion angles. You can have up to 10 rotamer_lib blocks in a given configuration file, each one pertaining to a particular protein side chain.

• For example, consider the following rotamer_lib command block:

rotamer_libname tyr370chi1 497 498 501 502chi2 498 501 502 503rotamer 60 90rotamer -65 (10) -85 (10:15)

end_rotamer_lib

• The text following name is a unique identifier of this rotamer_lib command block. Any text can be used but the obvious choice is the name of the side chain that the command block refers to, in this case Tyr370.

• The chi1 command specifies the atom numbers of the atoms defining the first rotatable torsion. In the example, this corresponds to rotation around C -C , so the atoms will be the backbone N (= atom 497), CA (498), CB (501) and CG (502). It is necessary to specify the atoms from the backbone outwards, i.e. chi1 502 501 498 497 would be invalid.

• The chi2 command specifies the second rotatable torsion. In this example, this corresponds to rotation around C -C , so the atoms are CA (498), CB (501), CG (502) and CD1 (503).

• You may specify up to 8 chi commands in a given rotamer_lib block.

• Each rotamer line describes one allowed conformation for the side chain.

• Thus, the first rotamer command specifies the first set of allowed values for chi1 and chi2. In the example, this is chi1 = 60, chi2 = 90.

• The second rotamer command specifies the second set of allowed values. The format x (y) specifies the range (x - y) to (x + y), while x (y:z) specifies the range (x - y) to (x + z). Note: in practice because the torsion angle distribution is divided into 10 degree bins users may see angles outside the specified input range as range boundaries are rounded up or down to the next bin. For instance the actual sampled range for chi2 will be -100 to -60 degrees.

• In summary, the effect of this rotamer_lib command block is therefore to allow Tyr370 to adopt the conformation of precisely chi1 = 60, chi2 = 90, or any conformation in the range chi1 = -80 to -50, chi2 = -100 to -70, with a preference for those angles in the centre of the range.

• You can have up to 50 rotamer commands in a rotamer_lib block.

206 GOLD User Guide

docking solutions are found, none of which closely resemble the correct binding mode.

5.2All waters turned off

• From Load GOLD run results ... read in the gold.conf file corresponding to your second set of results (Alternatively, turn off the Clear ligands on loading flag in the SILVER window already open, and read the results into there. This will allow you to directly compare docks from different runs).

• Click on Display Multiple ligands and then display the reference ligand.

• Check each solution in turn against that of the reference ligand. Now it is likely that only one docking mode is represented. This docking mode is close to that of the reference ligand. It is not a perfect superposition though, as the ligand attempts to contact the protein, along its edge, more closely than it does in reality. The values of the docking scores for this run are higher than those of the previous run

5.3All waters toggled.

• From Load GOLD run results ... read in the gold.conf file corresponding to your third set of results (Alternatively, read the results into a SILVER window containing the other two sets of docks.)

• Click on Display Multiple ligands and then display the reference ligand.

• Again only one docking mode should be observed. This docking mode is now much closer to that of the reference ligand. Also the scores for this run are higher than the two previous runs. Notice that the two waters able to interact with NH2 of the ligand, have also been able to

optimise their interactions with H-bond acceptor functionality in the protein, so that both are making three good H-bonds. The third water has been excluded in all the reported docking poses.

Page 34: Gold manual

GOLD User Guide 205

Any warning messages produced will be displayed in a separate GA Program Error Message window. Select Dismiss to close this window.

• Once the job is complete the message GA Done will appear.

4.2All waters turned off

• Now access the Water Selection pane again. Double click the top Water atom Number so it becomes displayed above the Add water or Update selected water button. Change the Water State to ‘Off’ and then hit Add water or Update selected water.

• Repeat for both the other waters, so that they are now all turned off. Return to the main interface.

• Go to the Output... pane and change the output sub-directory to a new one.

• Edit the name of the GOLD configuration file in the Configuration File text box in the top pane to gold2.conf.

• Hit Run. There is no need to change any other settings.

4.3All waters toggled

• Return to the Water Selection pane again. Change the state of each water to ‘toggle’ and ensure that the water orientation is set to spin. Return to the main interface

• Go to the Output... pane and change the output sub-directory to a new one.

• Edit the name of the GOLD configuration file in the Configuration File text box in the top pane to gold3.conf.

• Hit Run. There is no need to change any other settings for this job. Because allowing the waters to toggle on or off normally increases the size of the search necessary to find a good docking mode, it is generally recommended to increase the search time allowed per ligand, when toggling waters. The search problem becomes harder the more waters that are included. In this case because of the small size of the binding site and the fact the ligand has no rotatable bonds, the search problem is not large and the same settings can be used throughout.

5. Analysis of results

5.1 All waters turned on (see page 205)5.2 All waters turned off (see page 206)5.3 All waters toggled. (see page 206)

5.1All waters turned on

• Open SILVER and from Load GOLD run results ... read in the gold.conf file corresponding to your first set of results.

• Click on Display Multiple ligands and then display the reference ligand.

• Check each solution in turn against that of the reference ligand. You should find that several

20 GOLD User Guide

3.6.3 Using a Standard Rotamer Library

• The file <GOLD_DIR>/gold/rotamer_library.txt contains information taken from the paper The Penultimate Rotamer Library, S. C. Lovell, J. M. Word, J. S. Richardson & D. C. Richardson, Proteins, 40, 389-408, 2000. It is a compilation of the most commonly observed side-chain conformations for the naturally occurring amino acids.

• To make use of the rotamer information for a given residue, copy and paste the relevant rotamer_lib section into the GOLD configuration file. The residue name should be changed to something more meaningful, e.g. name 1qon_TYR370. The atom numbers that define each torsion angle (starting with the residue backbone N atom) should be entered on the lines starting chi, e.g. the opening lines of the template:

rotamer_libname tyrosinechi1 <at1 at2 at3 at4>chi2 <at1 at2 at3 at4>rotamer 62 (13) 90 (13)rotamer -177 (11) 80 (11)rotamer -65 (11) -85 (11)rotamer -65 (11) -30 (18)

end_rotamer_lib

might be edited to:

rotamer_libname 1qon_TYR370chi1 497 498 501 502chi2 498 501 502 503etc.

• All defined torsion angles, i.e. rotamer lines, can be used if required. Rotamer lines that are not needed can be deleted or commented out (by inserting the character # at the start of the line). Tolerances (in brackets) can be edited or deleted altogether. These tolerances allow some leeway for torsion angles (see Section 3.6.2, page 19) and in the file represent the positions of peak half-height either side of the torsion distribution peak, as determined by Lovell et al.

3.6.4 Allowing a Localised Backbone Movement

• Quite often, a side-chain rotation is accompanied by a small change in the local backbone conformation. For example, the figure below shows a detail from an overlay of two PDB structures (1qon, 1dx4) of the same enzyme:

Page 35: Gold manual

GOLD User Guide 21

• Not only has the Tyr side chain rotated around C -C and C -C , but there has also been a small backbone movement, primarily affecting the position of the C atom.

• Although minor (the two C positions are only 0.6Å apart), this movement is extremely important because it alters the vector direction C -C , and this can have a big leverage effect on the positions of atoms further down the side chain. In this case, it is impossible to overlay the Tyr370 side chain of 1dx4 closely onto that of 1qon simply by rotating around the C -C and C -C bonds. This is about as close as one can get:

204 GOLD User Guide

be using both these options shortly.

• Click on the Add water or Update selected water button.

• Repeat for waters 2169 and 2176.

• Click Done.

4. Running GOLD Dockings

4.1 All waters turned on (see page 204)4.2 All waters turned off (see page 205)4.3 All waters toggled (see page 205)

4.1All waters turned on

• The gold.conf file for this docking job uses the ChemScore scoring function. The Genetic Algorithm parameters used are the defaults

• Ensure that the allow early termination flag is set on.

• Hit Output... on the Input Parameters and Files page and put the name of an appropriate sub-directory in the Output directory... box. Click done to get back to the main interface.

• Edit the name of the GOLD configuration file in the Configuration File text box in the top pane to gold1.conf.

• Click on the Run button in the Control panel of the GOLD front end, this will start the GOLD job interactively. As the job progresses output will be displayed in the GOLD Output window.

Page 36: Gold manual

GOLD User Guide 203

the hydrogen positions on the waters have been optimised for maximal hydrogen bonding.This doesn’t matter as the water hydrogen positions can be optimised during docking.

3. Setting up protein bound waters• A configuration file gold.conf has been provided for this tutorial which will automatically

load most of the settings and parameter values for this tutorial into the GOLD front end.

• Open GOLD and click on the Configuration File button within the Control panel of the GOLD front end, then select the file gold.conf from <GOLD_DIR>/examples/tutorial5 and hit Open.

• We will need to identify the waters in the binding site that we particularly want to consider, and set up their chosen states. Click on the Waters button on the Input Parameters and Files pane. This will take you to the Water Selection pane.

• The atom IDs for the waters that we need, are those of the water oxygens in the protein.mol2 file. The relevant atom IDs are 2168 and 2169 for the waters that are not occluded by the ligand, and 2176 for the water that is so occluded.

• Type ‘2168’ in the water atom no. (oxygen) box and ensure that the water state is set to ‘On’ and water orientation is set to‘Spin’. This sets this water to be always present in the binding site and allows the hydrogen positions to vary during docking, in order to maximise the hydrogen bonding score both from interactions with the protein and the ligand. The ‘Off’ water state option allows a water to be removed from consideration during docking. The ‘Toggle’ option sets a water up so that it may either be removed, or kept and made use of in terms of hydrogen bonding, depending on which arrangement scores most highly for a given ligand pose. We will

22 GOLD User Guide

• The backbone movement can be mimicked by allowing the C atom and the attached side chain to rotate around the N-C vector, where N and C are the backbone atoms on either side of the C atom. This is defined as a rotation of the improper torsion defined by the atom sequence CA-N-C-CA (atom numbers 498, 497, 499 and 498 in this example):

rotamer_libname tyr370chi1 498 497 499 498chi2 497 498 501 502chi3 498 501 502 503rotamer 0 (30) 62 (11) 90 (11)rotamer 0 (30) -65 (11) -85 (21)

end_rotamer_lib

• This is the rotamer_lib block used as an example earlier (see Section 3.6.2, page 19), except that

Page 37: Gold manual

GOLD User Guide 23

an additional improper torsion has been defined as chi1 and the original chi1 and chi2 have been renamed as chi2 and chi3. The specification 0 (30) for the improper torsion angle will allow a rotation of (+ or -)30 degrees around the N-C vector, the zero angle corresponding to the C position given in the protein input file.

• It is not easy to decide on suitable rotation limits for improper torsions - a trial and error approach is normally required - but they often need to be quite large. For example, an improper rotation of about +40 degrees has to be applied to Tyr370 of 1dx4 for it to be possible to overlay the side chain closely onto the 1qon Tyr370 position.

3.6.5 Protein-Protein Clashes

• By default, when a flexible side chain is moved during docking, GOLD checks whether any of its atoms clash with atoms in neighbouring residues. This gives rise to an extra Protein Energy term which contributes to the total GoldScore value.

• The term is computed by summing the van der Waals interactions of all pairs of protein atoms which satisfy the following conditions: (a) at least one of the protein atoms is in a flexible side chain; (b) the van der Waals term for that pair of atoms is repulsive. The van der Waals interactions will be estimated using the same potential as is used for the protein-ligand vdw term (by default, this is a 4-8 potential).

• The protein-protein clash term can be switched off by including the command penalise_protein_clashes = 0 anywhere in a rotamer_lib block, e.g.

rotamer_libname tyr370penalise_protein_clashes = 0chi1 497 498 501 502chi2 498 501 502 503rotamer 62 (13) 90 (13)rotamer -65 (11) -85 (11)

end_rotamer_lib

• This will switch off calculation of the protein-protein clash term for all flexible side chains, not just the one corresponding to the rotamer_lib block in which you have placed the penalise_protein_clashes = 0 command.

202 GOLD User Guide

Tutorial 5: Docking with Water in the Binding Site

1. Introduction (see page 202)2. Preparation of Input Files (see page 202)3. Setting up protein bound waters (see page 203)4. Running GOLD Dockings (see page 204)5. Analysis of results (see page 205)6. Changing the Scoring function (see page 207)

1. Introduction• The object of this tutorial is to investigate docking to a binding site that contains water

molecules which a ligand may either displace, or alternatively, make use of through hydrogen bond interactions.

• The protein used here is acetylcholine esterase (PDB entry code 1ACJ), the protein that, more than any other, is essential for the correct transmittal of nerve impulses in the brain and around the body. The ligand is tacrine, an inhibitor of acetylcholine esterase which is a drug used to treat Alzheimer’s disease. The active site of the enzyme has been modelled with three water molecules in it, each of which makes hydrogen bonds with the protein.

• This tutorial will illustrate the requirements for setting up and running dockings in which the protein binding site features one or more water molecules. The example chosen mimics the situation where a researcher has a crystal structure of a protein binding site, and is unsure which and how many of the waters in that binding site should be included in the model for use in an inhibitor design effort.

2. Preparation of Input Files• Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/tutorial5.

• The acetylcholine esterase protein.mol2, has already been set up in accordance with the guidelines for the preparation of protein input files (see Section 3., page 9).

• The full protein is not displayed. Parts of the protein remote from the binding site have been deleted in order to speed up the calculation (see Section 3.1, page 9). Hydrogen atoms have been placed on the protein in order to ensure that the ionisation and tautomeric states are defined unambiguously (see Section 3.2, page 10).

• Read in the file ligand_reference.mol2 from <GOLD_DIR>/examples/tutorial5. You will be able to see how the ligand tacrine chooses to bind. You will find that two of the waters in the active site are within hydrogen bonding distance to the NH2 of the ligand, as well as to

hydrogen bond acceptors on the protein. The third water is at a position where it can make a hydrogen bond to the same Histidine backbone carbonyl as the protonated ring nitrogen of the ligand. This water cannot be accommodated if tacrine takes up its normal binding mode. None of

Page 38: Gold manual

GOLD User Guide 201

gold_soln_ligands_m1_3.mol2).

• Using SILVER, read in the docking results by specifying the gold.conf file.

• The position and orientation of the terminal sulphonamide groups in the docked solutions should be similar to that observed in the co-crystallised ETS inhibitor (i.e. coordinated to the zinc within the protein via the sulphonamide nitrogen).

• In the example below the terminal sulphonamide group of GOLD’s top-ranked solution can be seen to satisfy the specified constraint and reproduces the known binding mode of the co-crystallised ETS inhibitor:

This ends the tutorial.

24 GOLD User Guide

3.6.6 Specifying the Energy of a Side-Chain Rotamer

• An energy may be assigned to a given rotamer, e.g. as follows:

rotamer_libname tyr370chi1 497 498 501 502chi2 498 501 502 503rotamer 62 (11) 90 (11)energy 10rotamer -65 (11) -85 (18)

end_rotamer_lib

• This will penalise (i.e. reduce) the GoldScore value by 10 units if the Tyr370 side chain is placed in the chi1 = 62, chi2 = 90 conformation. In other words, it makes this conformation less favourable.

• Had the command energy -10 been included, its effect would have been to improve (i.e. increase) the GoldScore value.

3.7 Large Backbone Movements

• The only way of dealing with large backbone movements in GOLD is to perform separate docking runs on different binding-site conformations.

• Small backbone movements in the vicinity of a flexible side chain may be allowed by including the improper torsion angle CA-N-C-CA in a rotamer_lib command block (see Section 3.6.4, page 20). Another option you can try is to apply a Localised Soft Potential to one or more residues in the loop (see Section 6.2.1, page 47) (GoldScore only).

3.8 Defining the Binding Site

• You must specify the approximate centre and extent of the binding site. This can be done in several ways:

• from a point (see Section 3.8.1, page 25);

• from a protein atom (see Section 3.8.2, page 25);

• from a file containing a list of atoms (see Section 3.8.3, page 26);

• from a protein residue (see Section 3.8.4, page 26);

• from a file containing a list of residues (see Section 3.8.5, page 27);

• from a reference ligand (see Section 3.8.6, page 28).

• You can use cavity detection to confine the calculation to regions enclosed within concave parts of the binding site surface (see Section 3.8.7, page 28).

• The cavity volume, as determined by the cavity detection algorithm, can also be output (see Section 3.8.8, page 29).

Page 39: Gold manual

GOLD User Guide 25

3.8.1 Defining a Binding Site from a Point

• Switch on the button labelled Point in the GOLD front end.

• In the three boxes, type the orthogonal x,y,z coordinates of a single solvent-accessible point approximately at the centre of the active site in the protein.

• The approximate radius of the binding site must also be specified. By default the binding site radius is set to 10.0 Å.Type a radius in the box labelled Active site radius.

• If r is the radius, the binding site will be defined as all atoms within r Å of the specified point.

• The radius should be large enough to contain any possible binding mode of the ligand.

3.8.2 Defining a Binding Site from an Atom

• Switch on the button labelled Atom in the GOLD front end.

• Type in the atom number (as it appears in the protein input file) of a single solvent-accessible protein atom close to the centre of the active site of the protein.

• The approximate radius of the binding site must also be specified. By default the binding site radius is set to 10.0 Å.Type a radius in the box labelled Active site radius

• If r is the radius, the binding site will be defined as all atoms within r Å of the specified protein atom.

• The radius should be large enough to contain any possible binding mode of the ligand.

200 GOLD User Guide

• The file gives total fitness scores and a breakdown of the fitness into its constituent energy terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand), an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand intramolecular).

• An additional constraint scoring term S(con) is also listed. For docking solutions which satisfy the specified distance constraint the contribution from this scoring term will be 0.00. However, for solutions in which the constrained distances lie outside the specified bounds a negative S(con) score will be applied, thus reducing the overall fitness.

• Further details relating to substructure-based constraints are given within individual ligand log files. Your output directory should contain ten ligand log files gold_ligands_m#.log, one for each ligand.

• Open and inspect the ligand log file corresponding to the first ligand in the input file, i.e. gold_ligands_m1.log. This file will contain the distance bounds as specified in the constraint and the actual distance observed in the docked solution:

• From your bestranking.lst file identify GOLD’s top ranked solution for the ligand with the best total fitness score (in the example bestranking.lst file given above this would be

Page 40: Gold manual

GOLD User Guide 199

• As with standard distance constraints, the fitness score is reduced for solutions which do not satisfy the constraint. The amount by which the score is reduced is determined by a user-defined weight term. Set the value of the Spring const. to 20.0, then click on the Add constraint to update selected constraint button to add the constraint to the Current Constraints list. Hit Done to close the Constraint Editor.

4. Running GOLD• The time taken by GOLD to dock ligands can be controlled by altering the values of the genetic

algorithm (GA) parameters (see Section 10., page 89). GOLD runs for a fixed number of genetic operations (crossover, migration, mutation). Therefore reducing the number of GA operations performed during the course of a run will result in GOLD running faster, however the search will be less exhaustive.

• GOLD can decide on the optimal settings to use for a given ligand (see Section 11.3, page 94).

• To enable automatic GA settings, click on the Select GA Presets and Automatic Settings button in the Genetic Algorithm Parameters panel (or hit Settings in the Control panel) then, in the Settings selector window, click on Use automatic settings. Ensure the Search efficiency is set to 100%, then hit Done.

• Click on the Output button within the Input Files and Parameters panel, then hit the Output Directory... button. Specify a directory, to which you have write permission, this is where the GOLD output files will be written. Select Ok to close the Output preferences window.

• Click on the Run button in the Control panel of the GOLD front end, this will start the GOLD job interactively. As the job progresses output will be displayed in the GOLD Output window. Any warning messages produced will be displayed in a separate GA Program Error Message window. Select Dismiss to close this window.

• Once the job is complete the message GA Done will appear.

5. Analysis of Output• A file called bestranking.lst is written for batch jobs on multiple ligands. Open and inspect the

file bestranking.lst from your specified output directory in a text editor. This file gives a continuous summary of the best solution that has been obtained for each docked ligand.

• The listed file names correspond to the names of the files containing the best solution found for each ligand. For example, in the file below, gold_soln_ligands_m1_3.mol2 contains the best answer found for the first ligand (m1) in the input file:

26 GOLD User Guide

3.8.3 Defining a Binding Site from a List of Atoms

• Switch on the button labelled Atom nos. in the GOLD front end.

• A file which contains a list of protein atom numbers must be specified.

• Multiple atoms numbers are permitted on each line in the file, it is therefore possible to re-use an existing active site definition by using the list of active atoms printed in the protein.log file. Example file format is shown below:

• Each index is an index of an atom in the input protein.

• The list should contain all the solvent-accessible atoms which are required to explicitly define the protein active site since all acceptor and donor hydrogen atoms available to the ligand are taken from the list.

3.8.4 Defining a Binding Site from a Single Residue

• The ability to define a binding site from a single residues is not available from the GOLD front end. To do this, you need to edit the gold.conf file (see Section 15.1, page 126) and add the commands:

floodfill_atom_no = <atom number>floodfill_center = residue

• The atom number given can be any atom within the residue you want to define the active site from. GOLD will then get the substructure ID from that atom and find all other atoms that belong to the same substructure.

Page 41: Gold manual

GOLD User Guide 27

Note: in order to define the active site in this way, the amino acid substructures must be properly defined in the protein input file.

• The approximate radius of the binding site must also be specified. By default the binding site radius is set to 10.0 Å.Type a radius in the box labelled Active site radius.

• If r is the specified radius, all protein atoms within r Å of each atom in the selected residue are found, then all these atoms plus the atoms of their associated residues are used for the active site definition.

3.8.5 Defining a Binding Site from a List of Residues

• The ability to define a binding site from a list of residues is not available from the GOLD front end. To do this, you need to edit the gold.conf file (see Section 15.1, page 126) and add the commands:

floodfill_center = list_of_residuescavity_file = <path to text file>

• GOLD will then read in the specified text file e.g. list_of_residues.txt, and extract the residues listed.

• The list of residues can be extracted from any text file, including a standard GOLD solution file (GOLD writes the active site residues list to the solution files if output of rotatable hydrogens is turned on).

• The following formatting restrictions apply:

• The list must begin with the following tag on its own line:

> <Gold.Protein.ActiveResidues>

• The list must end with a blank line (or the end of the text file).

• GOLD will read multiple residue names from one line, but lines must not exceed 250characters in length.

• Residue names must be separated by a space, for example:

> <Gold.Protein.ActiveResidues>HIS69 ARG71 GLU72 ARG127 ASN144 ARG145 GLY155 ALA156 GLU163THR164 HIS196 SER197 TYR198 SER199 LEU201 LEU203 ILE243 ILE244ILE247 TYR248 GLN249 ALA250 GLY253 SER254 ILE255 THR268 GLU270PHE279 ZN309

• The list should contain all the residues which are required to explicitly define the protein active site since all acceptor and donor hydrogen atoms available to the ligand are taken from the list.

198 GOLD User Guide

• Click on the Substructure file name button, then select the file substructure.mol2 from <GOLD_DIR>/examples/tutorial4 and hit Open.

• Enter the Protein atom number and Substructure atom number to which the constraint applies. These are 2041 (the zinc atom number in the protein.mol2 input file) and 4 (the sulphonamide nitrogen atom in the substructure.mol2 file) respectively.

• Specify the allowed range of separation by entering a Maximum separation of 2.50 and a Minimum separation of 1.50 (distances are in Å).

Page 42: Gold manual

GOLD User Guide 197

• When setting up a distance constraint the protein and ligand atom numbers, as defined in the MOL2 input files, must be used. The maximum and minimum separation of the constrained atoms must also be entered (distances are in Å).

• During a GOLD run, if a constrained distance is found to lie outside the specified bounds, a spring energy term is used to reduce the fitness score. The spring energy term (E) = kx2, where x is the difference between the distance and the closest constraint bound and k is a user-defined spring constant.

• Select Cancel to close the Constraint Editor.

3.2 Substructure-Based Distance Constraints

• It is possible to apply a distance constraint to multiple ligands which have a common substructure or functional group.

• In order to use a substructure-based distance constraint it is first necessary to create a file containing the common substructure in MOL2 format.

• The substructure-based constraint forces GOLD to limit the distance between a protein atom and one atom of this functional group.

• During docking the constraint will be applied to any ligands which contain the specified substructure (matching is performed on the basis of the atom types and 2D connectivity) and the resulting solutions will be biased towards the specified distance range.

• A substructure file containing a sulphonamide group has been provided for this tutorial. Open and inspect the file substructure.mol2 from <GOLD_DIR>/examples/tutorial4 within SILVER. When creating your own substructure files it is recommended that you set atom types manually (see Section 8.2.3, page 72) since an incomplete fragment can cause problems with automatic atom-typing.

• Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Substructure Constraint from the list of constraint types:

28 GOLD User Guide

• Note: It is possible to use cavity detection when defining the active site from a list of residues. With cavity detection enabled, the cavity definition will be restricted to those specified atoms that are solvent-accessible (see Section 3.8.7, page 28).

3.8.6 Defining a Binding Site from a Reference Ligand

• Switch on the button labelled Ligand in the GOLD front end.

• Enter the name of a file which contains a reference ligand. This could be a ligand in a known binding mode, or the co-crystallised ligand.

• By default all protein atoms within 5.0 Å of each ligand atom are found, then all these atoms plus the atoms of their associated residues are used for the active site definition.

• These default settings can yield very large binding site definitions. To use only those protein atoms within the cavity distance threshold of each ligand atom (i.e. do not also include all atoms of their associated residues), edit the gold.conf file (see Section 15.1, page 126) and enter the keyword atom on the following line:

floodfill_center cavity_from_ligand <distance> atom

Note: the cavity distance threshold can also be changed by specifying a new <distance>value.

• Note: It is possible to use cavity detection when defining the active site from a list of residues. With cavity detection enabled, the cavity definition will be restricted to those specified atoms that are solvent-accessible (see Section 3.8.7, page 28).

3.8.7 Cavity Detection

• A cavity detection algorithm (Hendlich, Rippmann and Barnickel, LIGSITE: Automatic and efficient detection of potential small molecule binding sites in proteins, Merck technical report, 1997) is used to restrict the region of interest to concave, solvent-accessible surfaces.

• Cavity detection is enabled by switching on the button labelled Detect Cavity.

Page 43: Gold manual

GOLD User Guide 29

3.8.8 Output of Cavity Volume

• The cavity volume, as determined by the cavity detection algorithm, can be output. To do this, you need to edit the gold.params file and add the command:

CAVITY_VOLUMES=1

The volume of the cavity will be written to the gold_protein.log file, e.g.

Volume of docking regions (stochastic sampling)Box volume: 39060.00Acc. volume: 26703.33Surf. volume: 15492.00Probe rad. 1.400, Samples: 117180

3.9 Protein File Formats

• Acceptable protein file formats are PDB and MOL2.

3.10 Specifying the Protein File Name

• Click on the Protein button in the GOLD front-end. The file selection window will appear, e.g.

• Use the file selection window to choose the protein data file. When you have finished, the file name will appear in the entry box next to the Protein button in the GOLD front end.

196 GOLD User Guide

3. Distance Constraints

3.1 Standard Distance Constraints (see page 196)3.2 Substructure-Based Distance Constraints (see page 197)

• Any distance between a ligand atom and a protein atom can be constrained, or restrained, to lie between minimum and maximum distance bounds.

• GOLD features two types of distance constraint:

• A standard distance constraint for use with individual ligands (see Section 3.1, page 196).

• A substructure-based distance constraint for use with multiple ligands which have a commonfunctional group (see Section 3.2, page 197).

3.1 Standard Distance Constraints

• Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Distance Constraint from the list of constraint types:

Page 44: Gold manual

GOLD User Guide 195

• The terminal sulphonamide nitrogen atom of the ligand clearly coordinates to the zinc. We can attempt to reproduce this known binding mode within GOLD with the introduction of a distance constraint during docking.

• Ten ligands, each structurally similar to the ETS inhibitor, will be screened using GOLD. These ligands were identified using Relibase+, a program for search and anaLysis of protein-ligand complexes (http://www.ccdc.cam.ac.uk/products/life_sciences/relibase/).

• These ligands, ligand.mol2, are available from <GOLD_DIR>/examples/tutorial4, note that each of the ten ligands in this file features a terminal sulphonamide group.

• A configuration file (gold.conf) has been provided for this tutorial which will automatically load the settings and parameter values for this tutorial into the GOLD front end.

• Open GOLD and click on the Configuration File button within the Control panel of the GOLD front end, then select the file gold.conf from <GOLD_DIR>/examples/tutorial4 and hit Open.

30 GOLD User Guide

4. Setting Up Ligands

4.1 Essential Steps in Setting Up a Ligand (see page 30)4.2 Ligand Hydrogen Atoms, Ionisation States and Tautomeric States (see page 30)4.3 Ligand Geometry, Conformation and Stereochemistry (see page 31)4.4 Ligand File Formats (see page 31)4.5 Specifying the Ligand File(s) (see page 32)4.6 Setting Up Covalently Bound Ligands (see page 33)

4.1 Essential Steps in Setting Up a Ligand

• Add all hydrogen atoms, including those necessary to define the correct ionisation and tautomeric states (see Section 4.2, page 30).

• Ensure that all bond types are correct. If they are, and hydrogen atoms have been placed on the correct atoms, GOLD will deduce atom types automatically when atom typing is turned on (see Section 5.2, page 36).

• GOLD assigns atom types from the information about element types and bond orders in the input structure file, so it is important that these are correct. However, if for any reason, GOLD is unable to deduce an atom type, then the atom in question will be replaced with a dummy atom type Du. If this is the case a warning message will be given in the gold_protein.log file.

• The presence of dummy atoms should not significantly affect the docking prediction since dummy atoms are neither considered as donors or acceptors.

• There is usually a right and a wrong way to code groups which can be drawn in more than one way (i.e. have more than one canonical form), such as nitro, carboxylate and amidinium (see Section 5.5, page 39).

• The starting geometry of the ligand should be reasonably low in energy, since GOLD will not alter bond lengths or angles, or rotate rigid bonds such as amide linkages, double bonds and certain bonds to trigonal nitrogens. However, GOLD will optimise the values of torsion angles around rotatable bonds.

• Save the ligand as a MOL2 file (i.e. Tripos format) or a MOL file (i.e. MDL SD format). It is also possible (but not recommended) to use PDB format. If using PDB format CONECT records should also be included (see Section 4.4, page 31).

4.2 Ligand Hydrogen Atoms, Ionisation States and Tautomeric States

• GOLD uses an all-atom model, so the ligand must have all hydrogen atoms added.

• The precise geometrical positions of rotatable (e.g. hydroxyl and amino) hydrogen atoms do not matter, as they will be optimised during the GOLD run.

• GOLD deduces hydrogen-bonding abilities from the presence or absence of hydrogen atoms. For example, you can control the protonation state of a carboxylic acid group by adding or removing the ionisable hydrogen atom.

Page 45: Gold manual

GOLD User Guide 31

• If incorrect ionisation or tautomeric states are inferred by the program, it is unlikely that correct protein-ligand binding modes will be predicted. If you are unsure about, e.g., the preferred ionisation state of the ligand, you should perform separate GOLD runs using the different possibilities.

• GOLD ignores atom charges, both formal and partial. It deduces whether an atom is charged by counting the bond orders of the bonds that it forms and comparing the result with the atom’s normal valency.

4.3 Ligand Geometry, Conformation and Stereochemistry

• The ligand conformation will be varied by GOLD during docking. The starting conformation therefore does not matter.

• GOLD will not alter bond lengths or angles. These parameters should therefore be set to reasonably optimum values. A good practice is to build the ligand in an arbitrary conformation and then perform a few cycles of molecular-mechanics minimisation to take the ligand close to its local potential-energy minimum.

• Ring conformations and the torsion angles around rigid bonds such as amide linkages, double bonds and certain bonds to trigonal nitrogens will normally be fixed at their starting values. However, you can use the Fitness and Search Options button in the GOLD front end to enable some of these features to vary (see Section 7., page 64).

• GOLD will not alter stereochemistry. If you are unsure about the stereochemistry of the ligand, you must generate all alternatives and dock each separately. It is meaningful to make comparisons between fitness scores for dockings of different stereoisomers.

4.4 Ligand File Formats

• Acceptable ligand file formats are MOL2 (i.e. Tripos format), MOL (i.e. MDL SD format) and PDB (although we do not recommend the use of pdb format). Files in MOL format may also have the extension .mdl or .sdf.

• Only MOL2 may be used if you wish to set ligand atom types manually.

• An extension to the PDB file format is required if it is used for storing the ligand structure. Specifically, a bond specified twice in a single CONECT record is assumed to be a double bond, and a bond specified three times in a single CONECT record is assumed to be a triple bond. For example, the following CONECT records both specify a double bond between atoms with serial numbers 25 and 26:

CONECT 25 20 26 30 26 CONECT 26 25 27 52 25

• This mechanism for specifying bond orders is forced by the lack of a bond-order field in the standard PDB format, and seems to offer lots of scope for users to commit errors. For that reason, we recommend that the PDB format is not used for ligands.

194 GOLD User Guide

Tutorial 4: Use of Substructure Based Distance Constraints

1. Introduction (see page 194)2. Input Files (see page 194)3. Distance Constraints (see page 196)4. Running GOLD (see page 199)5. Analysis of Output (see page 199)

1. Introduction• The object of this tutorial is to assess the binding of a small number of structurally related

ligands with the carbonic anhydrase II, PDB entry code 1cil. In the ETS inhibitor a terminal sulphonamide nitrogen atom is observed to coordinate to a zinc atom within the protein binding site.

• This tutorial will illustrate how GOLD can be used to screen a number of compounds in order to identify ligands with potential activity. The use of constraints in order to bias solutions towards the observed binding mode of the inhibitor will also be demonstrated, as well as the use of automatic speed settings.

2. Input Files• Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/tutorial4. The original protein PDB file <GOLD_DIR>/examples/tutorial4/1CIL.pdb has also been provided should you wish to set up the protein for yourself.

• Carbonic anhydrase II, 1cil, protein.mol2, has already been set up in accordance with the guidelines for the preparation of protein input files (see Section 3.2, page 10).

• Upon inspection of the protein you will see that the zinc atom is coordinated to three histidine groups, the one remaining zinc coordination site is available for binding to the ligand.

• Read in the file ligand_reference.mol2 from <GOLD_DIR>/examples/tutorial4. Inspect the crystallographically observed position of the ETS inhibitor (shown in green) within the protein binding site:

Page 46: Gold manual

GOLD User Guide 193

• Interactions between the cyclic urea inhibitor and HIV-1 protease can be divided into two groups: those that anchor the scaffold in the active site and those that fix the substituents in the target subsites.

• Confirm that the hydrogen bonds specified in the constraints are formed as expected to the cyclic urea scaffold by measuring the relevant contact distances. Identify any additional hydrogen bonding interactions between the benzimidazole substituents and the target subsites within the protein.

This ends the tutorial.

32 GOLD User Guide

4.5 Specifying the Ligand File(s)

• Any number of ligands can be specified, either by selecting several individual files, or by selecting a directory containing several ligand files, or by selecting a single file containing several ligands (i.e. a multi-MOL2 or SD file). GOLD will dock each in turn.

• Click on the Edit Ligand File List button in the GOLD front-end. The Ligand Selection for docking run window will appear:

• Click on the Filename button. In the resulting dialog select the required file or directory and hit Open.

• Specify the number of times each ligand is to be docked by entering a value in the No. of GA runs box (see Section 11.1, page 93).

• When using a single file containing several ligands (i.e. a multi-MOL2 or SD file) it is possible to only dock specific ligands in that file. Enable the Specify ligand numbers check-box and specify which ligand you wish to start and finish docking at (by entering the number relating to the position of the ligand within the file).Note: Unless specified otherwise GOLD will, by default, start at the first ligand and finish at the last ligand in the file.

• Once a selection has been made click on either the Add file or Update selected file button or Add all files in directory button to add the chosen ligand file or directory to the Current Ligand File Selection.

• Repeat the above procedure if you want to select further ligands for docking.

Page 47: Gold manual

GOLD User Guide 33

• To edit a Ligand File Selection (e.g. to change the number of times the ligand will be docked) highlight the ligand file with the mouse, make the required change and hit the Add file or Update selected file button.

• To remove a file from the Current Ligand File Selection highlight the ligand file with the mouse and hit the Delete Selection button, or to remove all files hit the Clear List button.

• Click on Done in the Ligand Selection window when you are satisfied with the selected ligands. When you finish, the count of ligands will be updated in the front end.

4.6 Setting Up Covalently Bound Ligands

• GOLD is able to dock covalently bound inhibitors, but only if you specify which ligand atom is bonded to which protein atom. GOLD supports two types of covalent link:

• A covalent link for use with individual ligands (see Section 4.6.2, page 34).

• A substructure-based covalent link for use with multiple ligands which have a commonfunctional group (see Section 4.6.3, page 34).

4.6.1 Method Used for Docking Covalently Bound Ligands (see page 33)4.6.2 Setting Up a Single Covalent Link (see page 34)4.6.3 Setting Up Substructure-Based Covalent Links (see page 34)

4.6.1 Method Used for Docking Covalently Bound Ligands

• GOLD is able to dock covalently bound inhibitors, but only if you specify which ligand atom is bonded to which protein atom.

• The program assumes that there is just one atom linking the ligand to the protein (e.g. the O in a serine residue). Both protein and ligand files are set up with the link atom included (so, if the serine O is the link atom, it will appear in both the protein and ligand input files). Ideally the link atom, in both the ligand and the protein, will have a free valence available through which the link can be made. If the link atom on the ligand does not have a free valence, having a hydrogen instead, then the docking will proceed and the hydrogen will be ignored in terms of its contribution to the fitness score. It will however still be displayed when docking poses are visualised.

• Inside the GOLD least-squares fitting routine, the link atom in the ligand will be forced to fit onto the link atom in the protein.

• In order to make sure that the geometry of the bound ligand is correct, the angle-bending potential from the Tripos Force Field has been incorporated into the fitness function. On evaluating the score for the docked ligand, the angle-bending energy for the link atom is included in the calculation of the fitness score.

• This seems to work well in the systems on which GOLD was validated. However, since the protein is held rigid (apart from hydroxyl hydrogen atoms), it does require that the position of the link atom in the protein is sensible.

192 GOLD User Guide

terms for each docking performed on the ligand. In the example below the fitness score for the solution found on the first docking attempt (gold_soln_ligand_m1_1.mol2) is shown:

• A constraint scoring term S(con) is listed for each docking. If a solution predicted by GOLD satisfies all of the protein H bond constraints then the contribution from this scoring term will be 0.00. However, for solutions in which not all of the constraints are satisfied, a penalty will be applied to the fitness score for each constrained H bond that is not formed. The value of this penalty is the Constraint weight previously specified (see Section 3.2.1, page 187).

• The details of each specified protein H bond constraint satisfied in the solution are listed and an overall constraint score is given. A list of all hydrogen bonds formed between ligand and protein is also provided in the ligand log file.

• From your ligand_m1.rnk output file identify GOLD top ranked solution. Docking attempts are listed in decreasing order of fitness, so the best solution is placed first. Load the GOLD results into SILVER and display the top-ranked solution.

• Inspect how well the docked inhibitor fits within the protein binding site as predicted by GOLD:

Page 48: Gold manual

GOLD User Guide 191

• Once all of these protein H bond constraints have been set up the Constraints Editor window should contain four individual constraints:

• Hit Done to close the Constraint Editor window.

4. Running GOLD• Click on the Output button within the Input Files and Parameters panel, then hit the Output

Directory... button. Specify a directory, to which you have write permission, this is where the GOLD output files will be written. Select Ok to close the Output preferences window.

• Click on the Run button in the Control panel of the GOLD front end, this will start the GOLD job interactively. As the job progresses output will be displayed in the GOLD Output window.

• Once the job is complete the message GA Done will appear.

5. Analysis of Output• Open the file gold_ligand_m1.log, from your specified output directory, using a text editor.

• This file will give a total fitness score and a breakdown of the fitness into its constituent energy

34 GOLD User Guide

4.6.2 Setting Up a Single Covalent Link

• Set up the protein and ligand structures so that they both contain the link atom (see Section 4.6.1, page 33).

• In the GOLD front end, click on the Covalent button in the Input Parameters and Files panel.

• In the resulting Covalently-bound ligand settings dialog enable the Apply covalent docking check-box and select Atom to atom for the type of link.

• Enter the atom numbers (as defined in the Sybyl MOL2 input files, or use PDB sequence numbers if PDB input is used) of the link atom in the protein and ligand into the appropriate boxes:

• Hit Done to accept the current selections and close the Covalently-bound ligand settings dialog.

• Any constraint created can be altered by selecting the relevant line in the Constraint Editor window, clicking on the Add or Edit button in this window and selecting Edit selected to edit the constraint.

4.6.3 Setting Up Substructure-Based Covalent Links

• It is possible to apply a covalent link to multiple ligands which have a common functional group. During docking the link will be applied to any ligands which contain a specified substructure (matching is performed on the basis of the atom types and 2D connectivity).

• Note: the substructure must be a sub-graph rather than a complete molecule.

• To use a substructure-based covalent link, first create a file containing the substructure in MOL2 format (e.g. substructure.mol2). It is recommended that you set atom types manually (see Section 5.3, page 37) since an incomplete fragment can cause problems with automatic atom-typing. The actual conformation of the group in this file is not important, as only the atom types

Page 49: Gold manual

GOLD User Guide 35

and 2D connectivity will be used.

• Click on the Covalent button in the Input Parameters and Files panel of the GOLD front-end. This will open the Covalently-bound ligand settings dialog:

• Enable the Apply covalent docking check-box and select Atom to substructure for the type of link.

• Click on the Substructure file button, then select the required substructure file and hit Open.

• Enter the Protein atom number and Substructure atom number to which the covalent link applies (numbering as in the MOL2 files).

• Enable the Topology matching check-box if the constraint refers to a substructure atom (and therefore a ligand atom) which is topologically equivalent to other atoms (e.g. it is one of the oxygen atoms of an ionised carboxylate group), GOLD will then use whichever of the equivalent atoms gives the best result.

• Hit Done to accept the current selections and close the Covalently-bound ligand settings dialog.

190 GOLD User Guide

• By default the Constraint weight and Minimum H bond geometry weight should be 10.0 and 0.005 respectively. Select Add constraint or update selected constraint to accept these values. The specified constraint should now appear in the Current Constraints list.

• Specify protein H bond constraints for the three remaining key hydrogen bonding interactions as outlined in the table below:

Protein H bonding group

Atom number(s) Constraint weight Minimum H bond geometry weight

Ile50 1388 10.0 0.005

Ile50’ 468 10.0 0.005

Asp25’ 1161 or 1162 10.0 0.005

Page 50: Gold manual

GOLD User Guide 189

• Protein H bond constraints can be used in order to attempt to reproduce these key interactions during docking.

• Specify that either oxygen atom of the carboxylate group of Asp25 should form a hydrogen bond to the ligand by entering the corresponding Protein atom(s) required to form H-bond in the Constraint Editor window:

36 GOLD User Guide

5. Atom and Bond Types

5.1 Atom and Bond Type Overview (see page 36)5.2 Automatically Setting Atom and Bond Types (see page 36)5.3 Manually Setting Atom and Bond Types (see page 37)5.4 Overriding Automatic Bond Settings (see page 38)5.5 Atom and Bond Type Conventions for Difficult Groups (see page 39)5.6 Internal GOLD Atom Types (see page 45)

5.1 Atom and Bond Type Overview

• Each protein and ligand atom must be assigned an atom type which is used, for example, to determine whether the atom is capable of forming hydrogen bonds.

• GOLD atom typing is based on SYBYL atom types. Internally, GOLD also uses some additional atom types (see Section 5.6, page 45).

• SYBYL bond types are also used.

• Correct assignment of atom and bond types is crucial.

• GOLD assigns atom types from the information about element types and bond orders in the input structure file, so it is important that these are correct. However, if for any reason, GOLD is unable to deduce an atom type, then the atom in question will be replaced with a dummy atom type Du. If this is the case a warning message will be given in the gold_protein.log file.

• The presence of dummy atoms should not significantly affect the docking prediction since dummy atoms are neither considered as donors or acceptors.

• Atom types may be set manually, provided you are using MOL2 input files (see Section 5.3, page 37).

• Alternatively, they can be set automatically (see Section 5.2, page 36). Unless you are an expert GOLD user or are dealing with a very unusual ligand structure, you are recommended to use this option. However, you still need to input the ligand and protein structures correctly, e.g. with correct bond orders and appropriate protonation states.

5.2 Automatically Setting Atom and Bond Types

• Unless you are an expert GOLD user or are dealing with a very unusual ligand structure, you are recommended to use the automatic atom-type assigner. This requires that the Set atom types check buttons are switched on in the GOLD front end.

• GOLD assigns atom types from the information about element types and bond orders in the input structure file, so it is important that these are correct (see Section 5.5, page 39). However, if for any reason, GOLD is unable to deduce an atom type, then the atom in question will be replaced with a dummy atom type Du.

• It does not matter whether the bonds in an aromatic ring are coded as aromatic (ar) or alternate single and double, as the GOLD atom-type assigner will automatically assign the special

Page 51: Gold manual

GOLD User Guide 37

SYBYL bond type ar where appropriate.

• The atom-type assigner will also detect amide linkages and assign them the SYBYL bond type am.

• Care should be taken when using the type-assignment software on protein input files. In particular, the software is likely to be unreliable if protein residues have been partially deleted, so that some atoms appear to have free valencies. This situation can be avoided by ensuring that all residues included in the input file are complete.

• There is usually a right and a wrong way to code groups which can be drawn in more than one way (i.e. have more than one canonical form), such as nitro, carboxylate and amidinium. A list of correct bond types for some of the common, difficult groups is available (see Section 5.5, page 39).

• Because correct atom typing is so important, any messages from the type checker are logged in both the gold_protein.log file and the gold.err file. These errors will also be displayed in a separate window if GOLD is run through the front end.

5.3 Manually Setting Atom and Bond Types

• If you do not want to use the automatic atom- and bond-type assignment available in GOLD, you can define the atom and bond types yourself, provided that you use MOL2 format. This option is useful when you want to set unusual atom types or user-defined types.

• GOLD atom typing is based on SYBYL atom types (see Appendix A: List of Atom and Bond Types, page 149).

• SYBYL bond types are also used (see Appendix A: List of Atom and Bond Types, page 149).

• Even if atom types are set manually, the automatic atom-type assignment software is still run to check the ligand structure for inconsistencies. Any errors will be recorded in both the log file and the error file. In most cases, input types will not be reset.

• If for any reason GOLD is unable to deduce an atom type, then the atom in question will be replaced with a dummy atom type Du.

• Bond types must be correctly set (see Section 5.5, page 39). This is normally just a case of checking single and double bonds. However, the amide bond must be set to the am bond type. Also, the ar bond type is used for delocalised bonds (e.g. in carboxylate, phosphate and guanidinium ions) as well as for aromatic bonds.

• Atom types should conform to those expected in SYBYL. In particular, sp2 oxygen is atom type O.2, sp3 oxygen is O.3, tetrahedral nitrogen is N.3 (or N.4 if protonated), planar (non-amide) nitrogen is N.pl3 and the planar amide nitrogen is N.am. The atom type O.co2 should be used for the oxygens of carboxylate and phosphate ions or the singly-charged oxygen of phenolates.

• If an atom is mis-typed, it is possible that GOLD will assign it the wrong H-bond donor or acceptor properties. Therefore, correct atom-type assignment is crucial. An N.3 donor (tetrahedral nitrogen), is very different from an N.4 (protonated nitrogen) or an N.pl3 (planar trigonal nitrogen) donor. The assignment of rotatable bonds may also be affected. If a bond has

188 GOLD User Guide

GOLD. The Minimum H bond geometry weight takes a range of values from 0 to 1, by default this value is set at 0.005.

3.2.2 Specifying Multiple Constraints

• Using the Constraint Editor it is possible to specify several different protein H bond constraints, with different weights for each constraint. Simply specify the protein atom number and required weight and click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints. Repeat the procedure to set up further constraints, each constraint will be displayed on a separate line in the Constraint Editor window.

• For a given protein H bond constraint more than one protein atom number can be entered in the Protein atom(s) required to form H-bond input box. This will instruct GOLD to use an ‘either-or’ type of constraint during docking. For example, specifying two protein atoms m and n, separated by a space, will result in the constraint being satisfied if an H bond is formed to either m or n during docking. This is of particular use when defining constraints involving, for example, carboxylates where it is not important which oxygen atom forms an H bond, provided one does.

3.2.3 Defining the Protein H Bond Constraints

• The crystal structures of HIV-1 protease in complex with a number of cyclic urea inhibitors have been determined. It has been observed that the central urea moiety is anchored in the active site of the protease by six key hydrogen bonds:

• Two hydrogen bonds between the urea oxygen atom and the protein backbone peptide groupsof Ile50 and Ile50’ (shown below).

• Four hydrogen bonds between the cyclic urea diol and the carboxylates of the catalyticaspartate of the protein residues (ASP25’) (shown below).

Page 52: Gold manual

GOLD User Guide 187

3.2.1 General Methodology

• Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Protein H-Bond Constraint from the list of constraint types:

• When specifying a protein hydrogen bond constraint the protein atom number, as defined in the MOL2 input file, must be entered.

• GOLD will then be biased towards finding solutions in which the specified protein atom forms hydrogen bonds. However, as with standard hydrogen bond constraints such a solution is not guaranteed.

• During the GOLD run the fitness score of a given docking will be penalised for every protein H-bond constraint that is not satisfied.

• The Constraint weight is the strength of bias applied to the formation of a specified hydrogen bond in the least squares mapping algorithm within GOLD. The Constraint weight is also the value of the penalty applied to the fitness score for each constrained H bond that is not formed.

• The Minimum H bond geometry weight is a user defined score that determines how good a hydrogen bonding interaction has to be in order for it to be considered a hydrogen bond by

38 GOLD User Guide

the wrong type, it may be inappropriately allowed to rotate freely.

• The BUILD menu in SYBYL has a MODIFY sub-menu for altering atom/bond types. There is also a dialogue box for displaying atom- and bond-type labels.

• A list of atom and bond type conventions for some common, difficult groups is available (see Section 5.5, page 39).

5.4 Overriding Automatic Bond Settings

• When using fitness flags, e.g. Flip amide bonds (see Section 7.2, page 64) or Flip all planar R-NR1R2 (see Section 7.3, page 65), the bond in question is treated in a specific manner at ligand initialisation to prepare it for the docking run (in both the aforementioned cases, the bond is flattened at ligand initialisation prior to it being flipped during docking).

• If a bond is e.g. desired to rotate freely rather than flip during docking, this fine-grained control can be achieved by using the rotatable_bond_override.mol2 file, found in the $GOLD_DIR/gold/ directory. Some fragments are already provided (which can be edited), however user-specific ones may also be added. Instructions on how to do this, as well as further information, can be found in the file itself.

• This is particularly useful if further control is sought over more than one ligand with a common substructure in a ligand library file.

• This feature is only available if using GOLD via the command line.

• If the rotatable_bond_override.mol2 file is to be used, lines of the following type should be inserted into the gold.conf:

postprocess_bonds = 1rotatable_bond_override_file = <full_path_to_rotatable_bond_override_file>

• The new bond type(s) are specified in the rotatable_bond_override.mol2 file, in the @<TRIPOS>COMMENT part of the molecule file. The following format should be used:

RESET_BOND_TYPE <bond_number> <fix | flip | 1 | am>

• fix keeps the bond at its input angle. This option can also be specified for a single liganddocking via the gold.conf (see Section 7.7, page 66).

• flip causes 180 degree turns of the input angle geometry.

• 1 re-types the bond to a single bond, thus it is treated as fully rotatable.

• am re-types the bond as an amide bond.

• A report detailing what has been matched can be found in the gold_ligand.log file:

Page 53: Gold manual

GOLD User Guide 39

• Postprocessing is done by default, even if the line postprocess_bonds = 1 is not present in the gold.conf. Postprocessing can be switched off by adding the line postprocess_bonds= 0 to the gold.conf and running GOLD via the command line.

• If using the postprocess instruction and rotatable bond override file, the geometry is overruled whether the associated fitness flag is on or off.

• If a torsion distribution can be found and matched, this will be used to bias the geometry of the re-typed bond.

• Care should be taken to ensure the correct substructure is defined in the rotatable_bonds_override.mol2 file. If a substructure cannot be matched, the bond override will not be used.

5.5 Atom and Bond Type Conventions for Difficult Groups

• Use of correct atom and bond types in GOLD is important for producing good results.

• In order for the GOLD atom-type assigner to work correctly, it is necessary for the input structures to have correct bond orders. This can be difficult when a ligand contains a group that can be drawn in more than one way (i.e. a group which has more than one canonical form). In such cases, there is usually a right and a wrong way for GOLD, and you need to know which is which.

• This section explains how to set the bond orders of some common difficult groups. It also shows the atom types that GOLD will assign if bond types are set correctly (or that you must assign if you are setting atom types manually).

• Amidinium (see page 40)

• Carboxylate (see page 40)

• Enolate/phenolate oxygen (see page 40)

• Guanidinium (see page 41)

186 GOLD User Guide

front end, then select the file gold.conf from <GOLD_DIR>/examples/tutorial3 and hit Open.

3. Hydrogen Bonding Constraints• GOLD features two types of hydrogen bonding constraints:

• A standard hydrogen bond constraint can be used to force a hydrogen bond between a specificprotein atom and a specific ligand atom (see Section 3.1, page 186).

• A protein hydrogen bond constraint can be used to specify that a particular protein atomshould be hydrogen-bonded to the ligand, but without specifying to which ligand atom (seeSection 3.2, page 186).

3.1 Standard Hydrogen Bond Constraints

• A standard hydrogen bond constraint allows a particular ligand atom to be constrained to form a hydrogen bond to a particular protein atom.

• Click on the Edit Constraints button to bring up the Constraint Editor. Then, select H-Bond Constraint from the list of constraint types.

• When specifying a hydrogen bond constraint the ligand and protein atom numbers, as defined in the MOL2 input files, must be entered (if PDB input files are used, specify the sequence number).

• One of the atoms must be an H-bond donor and the other should be an acceptor. The protein atom must also be available for ligand binding (i.e. solvent accessible).

• Once defined, an H-bond constraint is incorporated into the least-squares fitting routine used by GOLD to dock the ligand. The constraint has a weight of 5 relative to a normal hydrogen bond. Thus, the docking will be biased towards solutions which include the specified hydrogen bond.

• The hydrogen bond constraint weighting can be altered within the Fitness Function section of the GOLD parameters file by changing the value of the parameter CONSTRAINT_WT.

• Select Cancel to close the Constraint Editor.

3.2 Protein Hydrogen Bond Constraints

3.2.1 General Methodology (see page 187)3.2.2 Specifying Multiple Constraints (see page 188)3.2.3 Defining the Protein H Bond Constraints (see page 188)

• A protein hydrogen bond constraint can be used to specify that a particular protein atom should be hydrogen-bonded to the ligand, but without specifying to which ligand atom (see Section 8.3.3, page 75).

Page 54: Gold manual

GOLD User Guide 185

Tutorial 3: Use of Hydrogen Bonding Constraints

1. Introduction (see page 185)2. Input Files (see page 185)3. Hydrogen Bonding Constraints (see page 186)4. Running GOLD (see page 191)5. Analysis of Output (see page 191)

1. Introduction• The design of new and more potent antiretroviral agents for the human immunodeficiency virus

(HIV) continues to be the focus of much attention. The crystal structures of HIV-1 protease in complex with a number of cyclic urea inhibitors have been determined in order to identify the key interactions responsible for the high potency of this class of inhibitor (see: Jadhav et al. J. Med. Chem., (1997) 40, 181). The C2 symmetric cyclic urea scaffold is well suited to interact

with the viral protease, it has been observed that these inhibitors are anchored in the active site of the protease by six key hydrogen bonds.

• The object of this tutorial is to investigate the binding mode of a cyclic urea inhibitor with HIV-1 protease, PDB entry code 1qbt. The use of hydrogen bonding constraints in order to reproduce these key interactions will also be illustrated.

2. Input Files• Open SILVER and read in and inspect the file protein.mol2 from <GOLD_DIR>/examples/tutorial3. The original PDB file <GOLD_DIR>/examples/tutorial3/1QBT.pdb has also been provided should you wish to set up the protein for yourself.

• HIV-1 protease, protein.mol2, has already been set up in accordance with the guidelines for the preparation of protein input files (see Section 3., page 9).

• An important feature of cyclic urea inhibitors is their ability, upon binding, to displace a structural water molecule present within the active site of the protein. In this example, all water molecules have been deleted from protein.mol2. However, in other complexes you may not know whether water molecules should form mediating hydrogen bonds, or be displaced by the ligand on binding. GOLD allows waters to switch on and off (i.e. to be bound or displaced) and to rotate (to optimise hydrogen bonding) during docking (see Section 3.4, page 16).

• The cyclic urea inhibitor has already been prepared in accordance with the requirements for setting up the ligand (see Section 4., page 30).

• Open the file ligand.mol2 from <GOLD_DIR>/examples/tutorial3 within SILVER and inspect the structure.

• A configuration file (gold.conf) has been provided for this tutorial which will automatically load the settings and parameter values for this tutorial into the GOLD front end.

• Open GOLD and click on the Configuration File button within the Control panel of the GOLD

40 GOLD User Guide

• N-oxide (see page 41)

• Nitro (see page 42)

• Nitrogen (anionic) (see page 42)

• Nitrogen (cationic, aromatic) (see page 42)

• Oxygen (anionic) (see page 43)

• Phosphate (bridging) (see page 43)

• Phosphate (terminal) (see page 43)

• Sulphonamide (see page 44)

• Sulphonate (see page 44)

• Sulphone (see page 44)

• Sulfoxide (sulfinyl) (see page 44)

Amidinium

Carboxylate

Enolate/phenolate oxygen

Page 55: Gold manual

GOLD User Guide 41

or:

Guanidinium

N-oxide

or:

or:

184 GOLD User Guide

• Metal coordination in GOLD is modelled as ’pseudo-hydrogen bonding’. Metal-ligand interactions will typically involve the metal binding to, for example, carboxylate ions, deprotonated histidines (i.e. negatively charged), and phenolates. Therefore metals can be considered to bind to H-bond acceptors and the metal will compete with H-bond donors for interaction.

This ends the tutorial.

Page 56: Gold manual

GOLD User Guide 183

5.1 Protein Log File

• Open and inspect the file gold_protein.log (from the output directory specified (see Section 3., page 179) using a text editor.

• The gold_protein.log file will contain details of the parameterisation of the protein and the determination of the ligand binding site. Information relating to the metal and the determination of the coordination geometry will also be given:

• Check to see that the coordination geometry has been correctly overruled, and that the matched geometry is tetrahedral. Further information about the contents of the gold_protein.log file are given elsewhere, (see Section 14.9, page 118).

5.2 Files Containing the Protein and Docked Ligands

• Open and inspect the file gold_protein.mol2 (located within your specified output directory (see Section 3., page 179) using SILVER. The protein file now contains a number of dummy atoms representing idealised metal coordination positions. These dummy atoms will be connected to the metal ion.

• At locations where GOLD is missing a coordination site (i.e. coordination points not bound to the protein) virtual coordination points are added. These coordination points are then used as fitting points that can bind to acceptors.

• From your specified output directory identify the top-ranked solution predicted by GOLD, ranked_ligand_m1_1.mol2 and open this file within SILVER.

• Inspect how well the docked benzyl succinate inhibitor fits within the protein binding site.

• The zinc (shown in blue) is coordinated to the protein via two histidine residues and a carboxylate group. In the example shown below, the remaining zinc coordination site is used to bind the benzyl succinate inhibitor (shown coloured in green) via interaction with a carboxylate ion acceptor along the direction of the carbonyl oxygen lone pair:

42 GOLD User Guide

Nitro

Nitrogen (anionic)

• For example, an anionic imidazole ring would be:

Nitrogen (cationic, aromatic)

• For example, the pteridine ring system in methotrexate (PDB code 4DFR) would be:

Page 57: Gold manual

GOLD User Guide 43

Oxygen (anionic)

• For example, in a serine protease transition-state analogue this would be:

Phosphate (bridging)

Phosphate (terminal)

182 GOLD User Guide

• Click on the Add metal or Update selected metal button to add the selection to the Current Metal Settings. Hit Done to close the Metal Selection window.

5. Running GOLD and Analysis of Output

5.1 Protein Log File (see page 183)5.2 Files Containing the Protein and Docked Ligands (see page 183)

• Click on the Run button in the Control panel of the GOLD front end, this will start the GOLD job interactively. As the job progresses output written to the gold_ligand_m1.log file will also be displayed in the GOLD Output window. Once the job is complete the message GA Done will appear.

Page 58: Gold manual

GOLD User Guide 181

• Open and inspect the GOLD parameters file by clicking on Edit Parameters within the Input Files and Parameters panel of the front end and then selecting Yes in the Copy parameter file? window.

• The parameters used by GOLD for each metal are listed, for explanation of parameters refer to comments in the gold.params file. Additional metal parameterisation can also be found within the H_BOND TABLE.

• For our Zn atom GOLD will therefore attempt to match coordination geometries 4, 5 and 6 (tetrahedral, trigonal bipyramidal, and octahedral templates) onto the coordinating atoms found in the protein. The template that gives the best match will then be used to generate coordination fitting points.

4.2 Manually Specifying Metal Coordination Geometries

• It is possible to manually specify coordination geometries for particular metal atoms. This can be useful in allowing non-standard metal coordination geometries, or to limit the number of possible geometries that GOLD checks (i.e. to overrule the default geometries for the corresponding metal type defined in the gold.params file).

• In this example, the zinc atom is clearly tetrahedral (the Zn is coordinated to two histidine residues and a carboxylate group in the protein, the fourth coordination site is available to bind to the benzyl succinate inhibitor). We can therefore instruct GOLD to match against the tetrahedral template only when determining the coordination geometry.

• Click on the Metals button in the Input Parameters and Files section of the GOLD front-end. In the resulting Metal Selection window specify the Metal atom no. 2096 (this is the Zn atom number as defined in the protein input MOL2 file), and select (4) tetrahedral from the list of allowed metal coordinations:

H-Bonding

type

Sybyl atom type Atom type (default or

elucidated)

Donor (D), Acceptor

(A), or Metal (M).

Allowed Coordination

geometries

Coordination

distance

MGD Mg DEF M 4, 6 2.05

ZND Zn DEF M 4, 5, 6 2.09

MND Mn DEF M 4, 6 2.06

FED Fe DEF M 4, 6 1.98

CAD Ca DEF M 6, 7 2.44

COBD Co.oh DEF M 6 2.09

GDD Gd DEF M 6 2.44

44 GOLD User Guide

Sulphonamide

• GOLD will treat the nitrogen atom as a planar, trigonal nitrogen, i.e. not capable of accepting a hydrogen bond. However, pyramidal sulphonamide nitrogen atoms are now typed as N.3, if the geometry read into GOLD is pyramidal rather than N.pl3, and are treated as H-bond acceptors (i.e. they have a fitting point) allowing them to coordinate metal groups.

Sulphonate

Sulphone

Sulfoxide (sulfinyl)

Page 59: Gold manual

GOLD User Guide 45

5.6 Internal GOLD Atom Types

• GOLD uses four internal atom types which are not recognised by SYBYL. These are N.plc (nitrogen donors in a protonated delocalised system, such as a guanidinium ion), N.acid (acidic nitrogen, e.g. in tetrazole or sulphonamide ions), S.a (sulphur acceptors) and S.m (charged sulphur atoms). You should not really need to know about these, but all assignments of the N.plc, N.acid, S.a and S.m atom types are logged in the gold.log file, so you can check to see if everything is working as you would expect.

180 GOLD User Guide

4. The Handling and Parameterisation of Metals in GOLD

4.1 Automatic Determination of Metal Coordination Geometries (see page 180)4.2 Manually Specifying Metal Coordination Geometries (see page 181)

• GOLD is able to predict binding to seven metal ions: Mg, Zn, Fe, Mn, Ca, Co and Gd.

• No special instructions are needed to dock to metal ions, they will be handled automatically when present in the protein binding site.

4.1 Automatic Determination of Metal Coordination Geometries

• GOLD will automatically recognise the following metal coordination geometries:

• In order to determine the coordination geometry of a particular metal atom GOLD performs a permuted superimposition of coordination geometry templates onto the coordinating atoms found in the protein. Coordination fitting points are then generated using the template that gives the best fit (based on RMSd).

• The geometry templates used for a given metal are defined in the gold.params file in the section headed # Metals:

Template Geometry Coordination number

TETR Tetrahedral n=4

TBP Trigonal bipyramidal n=5

OCT Octahedral n=6

CTP Capped trigonal prism n=7

PBP Pentagonal bipyramidal n=7

SQAP Square prism n=8

ICO Icosahedral n=10

DOD Dodecahedral n=12

Page 60: Gold manual

GOLD User Guide 179

• The benzyl succinate inhibitor has also been set up in accordance with the guidelines for the preparation of ligands (see Section 4.1, page 30).

• Open and inspect the file ligand.mol2 from <GOLD_DIR>/examples/tutorial2

3. The GOLD Configuration File• All of the parameters and settings required to define a particular GOLD job may be saved as a

configuration file (gold.conf) (see Section 15., page 126). This text file will include details of the ligand, the protein binding site, the fitness-function parameter file to be used, the torsion distribution file to be used, and the genetic algorithm parameters. Therefore there is no need to specify protein.mol2 and ligand.mol2 input files, as these will be read in upon opening gold.conf.

• A configuration file has been provided for this tutorial. Open GOLD and click on the Configuration File button within the Control panel of the GOLD interface, then select the file gold.conf from <GOLD_DIR>/examples/tutorial2 and hit Open. This will automatically load the settings and parameter values for this tutorial into the GOLD front end.

• Click on the Output button within the Input Files and Parameters panel, then hit the Output Directory... button. Specify a directory, to which you have write permission, this is where the GOLD output files will be written. Select Ok to close the Output preferences window.

46 GOLD User Guide

6. Fitness Functions

6.1 Choice of Fitness Functions (see page 46)6.2 GoldScore Fitness Function (see page 46)6.3 Altering GoldScore Fitness-Function Parameters; the GoldScore File (see page 48)6.4 ChemScore Fitness Function (see page 49)6.5 Altering ChemScore Fitness-Function Parameters; the ChemScore File (see page 58)6.6 Altering GOLD Parameters: the gold.params File (see page 59)6.7 Kinase Scoring Function (see page 59)6.8 Heme Scoring Function (see page 60)6.9 Internal Energy Offset (see page 62)6.10 User Defined Fitness Function (see page 62)

6.1 Choice of Fitness Functions

• GOLD offers a choice of fitness functions: GoldScore (see Section 6.2, page 46), ChemScore (see Section 6.4, page 49) and User Defined Score.

• With respect to use of either the GoldScore or ChemScore functions, both are about equally reliable although, on any given problem, one may give a good prediction and the other not. Therefore, when screening large numbers of compounds, rescoring docking poses with alternative scoring functions and considering the best results from each (consensus scoring) can have a favourable impact on the overall rank ordering of ligands (see Section 13., page 106).

• GoldScore is the original GOLD scoring function and is selected by default.

• User Defined Score allows users to implement their own scoring function (or modify an existing scoring function) by specifying a path to a dynamically loadable shared object library (see Section 6.10, page 62).

6.2 GoldScore Fitness Function

• The GOLD fitness function is made up of four components:

• protein-ligand hydrogen bond energy (external H-bond)

• protein-ligand van der Waals (vdw) energy (external vdw)

• ligand internal vdw energy (internal vdw)

• ligand torsional strain energy (internal torsion)

• Optionally, a fifth component, ligand intramolecular hydrogen bond energy (internal H-bond), may be added.

• If any constraints have been specified, then an additional constraint scoring contribution S(con) will be made to the final fitness score. Similarly, when docking covalently bound ligands a covalent term S(cov) will be present.

• Note: By default, output files will contain a single internal energy term S(int) which is the sum of

Page 61: Gold manual

GOLD User Guide 47

the internal torsion and internal vdw terms. To write these component terms to output files you will need to edit the gold.params file (see Section 6.3, page 48) to include the following line:

VERBOSE_SCORE = 1

• Empirical parameters used in the fitness function (hydrogen bond energies, atom radii and polarisabilities, torsion potentials, hydrogen bond directionalities, etc.) are taken from the GOLD parameter file. They can be customised by copying the file, editing the copy, and instructing GOLD to use the edited file (see Section 6.3, page 48).

• The fitness score is taken as the negative of the sum of the component energy terms, so that larger fitness scores are better.

• The external vdw score is multiplied by a factor of 1.375 when total fitness score is computed. This is an empirical correction to encourage protein-ligand hydrophobic contact.

• During a docking run, the fitness score may appear to get worse as the docking proceeds. This is due to the fact that the effects of poor H-bond geometry and close nonbonded contacts are artificially down-weighted at early stages of the docking (annealing) (see Section 10.8, page 91). Only the final fitness score (i.e. from the completed docking) has any meaning.

• The fitness function has been optimised for the prediction of ligand binding positions rather than the prediction of binding affinities, although some correlation with the latter has been found (see Section 16.2, page 137).

6.2.1 Docking With Localised Soft Potentials: An Alternative Form for the External Van der Waals Contribution (see page 47)6.2.2 Bump Checking (see page 48)

6.2.1 Docking With Localised Soft Potentials: An Alternative Form for the External Van der Waals Contribution

• GoldScore uses Lennard-Jones functional forms for both the External and Internal Van der Waals contributions to the Fitness Function. By default a 6-12 potential is applied to the Internal Van der Waals contribution and a 4-8 potential is applied to the External Van der Waals contribution. These defaults are defined in the gold.params file (see Section 6.3, page 48).

• The 4-8 potential form for the External contribution is selected as being optimum for general use. However there are cases where this potential form may be too severe in the short contact (i.e the clash) component. This would arise for instance, where part of the binding site is made up of a loop which it is known can move aside slightly to accomodate large ligands. In such cases it is possible to apply a softer 'Split Van der Waals Potential' for certain selected residues. Two alternative soft 'Split Potential' forms are parameterised in the gold.params file:

178 GOLD User Guide

Tutorial 2: Handling of Metals in GOLD

1. Introduction (see page 178)2. Preparation of Input Files (see page 178)3. The GOLD Configuration File (see page 179)4. The Handling and Parameterisation of Metals in GOLD (see page 180)5. Running GOLD and Analysis of Output (see page 182)

1. Introduction• The object of this tutorial is to investigate the binding mode of a benzyl succinate inhibitor with

the carboxypeptidase A, PDB entry code 1cbx. In this example, the benzyl succinate inhibitor is known to coordinate to a zinc atom within the ligand binding site of the protein.

• This tutorial will illustrate the requirements for setting up and running a docking in which the protein binding site features a metal ion. Additional information will also be provided on the handling and parameterisation of metals in GOLD.

2. Preparation of Input Files• Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/tutorial2. The original PDB file <GOLD_DIR>/examples/tutorial2/1CBX.pdb has also been provided should you wish to set up the protein for yourself (please note that the inhibitor coordinates will need to be deleted when preparing the protein).

• The carboxypeptidase A, protein.mol2, has already been set up in accordance with the guidelines for the preparation of protein input files (see Section 3., page 9).

• Upon inspection of protein.mol2 you should notice that parts of the protein remote from the binding site have been deleted in order to speed up the calculation (see Section 2.1, page 163), and that hydrogen atoms have been placed on the protein in order to ensure that the ionisation and tautomeric states are defined unambiguously (see Section 3.2, page 10).

• There are some additional requirements when preparing a protein input file which contains a metal ion:

• In the protein input file it is essential that the metal ion is coordinated to at least two proteinatoms or water molecules so that GOLD can determine the correct coordination geometry.

• In the protein input file, the metal ion must not have any bonds to coordinating atoms. If theseare present in the original protein file, they must be deleted.

• On closer inspection of the protein.mol2 input file you will see that the zinc atom is coordinated to two histidine residues and a carboxylate group. All bonds to coordinating atoms have been removed:

Page 62: Gold manual

GOLD User Guide 177

• Using this methodology GOLD has been validated against a large number of protein-ligand complexes taken from the PDB. Further details and the entire validation test set are available for download (see Section 16.1.3, page 131).

This ends the tutorial.

48 GOLD User Guide

EXTERNAL_POTENTIAL(1) = 4-8 2-4- Form 1EXTERNAL_POTENTIAL(2) = 4-8 1-2- Form 2

• The first term of each form describes long range interactions, the second term describes short range interactions. The point of change-over is at the 4-8 potential minimum and the second term is set such that both terms take the same value at this point. The function therefore remains continuous and the minimum point is the same as with the default 4-8 potential.

• To apply one of these two soft potentials to a single residue, edit the gold.conf file (see Section 15.1, page 126) and add the following instruction:

alt_residues(form) = <residue>

Where form is the 'Split Potential' form to be applied (i.e. 1 or 2), and <residue> is the residue to which the split potential is to be applied. e.g. specifying

alt_residues(1) = ALA148

will apply the split potential of form 1 to the residue Ala 148.

• More than one residue can be specified, and both potential forms can be used in the same gold run. In the example below two residues are assigned split potentials of form 1, and one is assigned a split potential of form 2.

alt_residues(1) = ALA148 ARG150alt_residues(2) = ARG149

6.2.2 Bump Checking

• Normally, a bump check is made to guard against unreasonably close contacts between ligand and protein atoms. However, if (and only if) GoldScore is being used, you can permit n ligand atoms to penetrate the protein by entering n in the No. of Ligand Bumps entry box, e.g.

6.3 Altering GoldScore Fitness-Function Parameters; the GoldScore File

• A GoldScore parameter file, goldscore.params, is provided in the $GOLD_DIR/gold directory.

• The goldscore.params file is used by default.

• Instructions on how to make use of the extended metal parameters is given elsewhere (see Section 6.8, page 60).

Page 63: Gold manual

GOLD User Guide 49

• To make use of the new metal parameters either replace the default file by renaming the goldscore.p450_<csd|pdb>.params, or by specifying one of p450 params files via the GOLD interface or in the gold.conf.

6.4 ChemScore Fitness Function

6.4.1 Introduction to ChemScore (see page 49)6.4.2 Block Functions in ChemScore (see page 50)6.4.3 Hydrogen-Bond Terms (see page 52)6.4.4 Metal-Binding and Lipophilic Terms (see page 54)6.4.5 Rotatable-Bond Freezing Term (see page 56)6.4.6 Clash Penalty and Internal Torsion Terms (see page 56)6.4.7 Covalent Term (see page 58)6.4.8 Constraint Terms (see page 58)

6.4.1 Introduction to ChemScore

• The ChemScore scoring function is published in:

• M. D. Eldridge, C. W. Murray, T. R. Auton, G. V. Paolini and R. P. Mee, J. Comput.-AidedMol. Des., 11, 425-445 (1997).

• C. A. Baxter, C. W. Murray, D. E. Clark, D. R. Westhead and M. D. Eldridge, Proteins, 33,367-382 (1998).

• ChemScore was derived empirically from a set of 82 protein-ligand complexes for which measured binding affinities were available.

• Unlike GoldScore, the ChemScore function was trained by regression against measured affinity data, although there is no clear indication that it is superior to GoldScore in predicting affinities.

• ChemScore estimates the total free energy change that occurs on ligand binding as:

• Each component of this equation is the product of a term dependent on the magnitude of a particular physical contribution to free energy (e.g. hydrogen bonding) and a scale factor determined by regression, i.e.

rotlipometalhbondbindingGGGGGG

0

176 GOLD User Guide

11.3 Files Containing The Docked Ligand (gold_soln_ligand_m#_n.mol2)

• The N-phosphonacetyl-L-aspartate ligand will have been docked a number of times, so a set of files will have been produced, each containing the results of a separate docking attempt.

• The result of each docking attempt is written out as gold_soln_ligand_m1_n.mol2, where n is the number of the docking solution 1,2,3 ... and m1 is an index to the ligand (in this example, only one ligand was docked).

• Note that the file gold_soln_ligand_m1_1.mol2 is not the best GOLD prediction, it is just the solution found in the first docking attempt. However, as GOLD proceeds, symbolic links are created: ranked_ligand_m1_1.mol2 will point to the current top-ranked solution, ranked_ligand_m1_2.mol2 will point to the second-best solution, and so on.

• Open and inspect the top ranked solution predicted by GOLD within your visualisation software package.

• A simple test of the effectiveness of a docking program is to take a protein-ligand complex from the PDB and extract the ligand. The docking program can then be used to predict the binding mode of the ligand and a comparison made with the crystallographically observed position. The crystallographically observed conformation of the docked N-phosphonacetyl-L-aspartate ligand is provided. Open the file ligand_reference.mol2 from <GOLD_DIR>/examples/tutorial1 and compare this with the solution predicted by GOLD.

• In the figure below the crystallographically observed reference structure ligand_reference.mol2 (shown in green) is compared with the top-ranked solution predicted by GOLD (shown coloured by element):

Page 64: Gold manual

GOLD User Guide 175

fitness score, while the solution found for docking attempt number 1 has the worst fitness:

11.2 Fitness Function Rankings Files (ligand_m1.rnk and bestranking.lst)

• Open and inspect the file ligand_m1.rnk in a text editor. This file contains a summary of the fitness scores for all the docking attempts on the N-phosphonacetyl-L-aspartate ligand.

• The docking attempts are listed according to fitness score, so the best solution is placed first.

• The file gives total fitness scores and a breakdown of the fitness into its constituent energy terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand), an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand intramolecular). For example:

• A file called bestranking.lst is written when running dockings with multiple ligands. This gives a continuous summary of the best solution that has been obtained for each completed ligand. The file gives total fitness scores and a breakdown of the fitness into its constituent energy terms.

50 GOLD User Guide

• Here, the terms are the regression coefficients and the P terms represent the various types of physical contributions to binding.

• The final ChemScore value is obtained by adding in a clash penalty and internal torsion terms, which militate against close contacts in docking and poor internal conformations. Covalent and constraint scores may also be included.

6.4.2 Block Functions in ChemScore

• ChemScore uses block functions throughout its implementation to describe contact terms of various types.

• A block function is of the following form:

• This functional form looks like:

rotrot

lipolipo

metalmetal

hbondhbond

PvGPvG

PvGPvG

vG

4

3

2

1

00

intconstracovalentcovalentinternalinternalclashbindingPPcPcPGChemScore

max

maxideal

idealmax

ideal

ideal

ideal

xx

xxxxx

xx

xx

xxxB

if

if

if

0

0.1

1

),,(max

Page 65: Gold manual

GOLD User Guide 51

• In the GOLD implementation of ChemScore, the block function is sometimes convoluted with a Gaussian function:

• The effect is to smooth the function, e.g.:

0.0

1.0

xideal xmax

duug

duugxxuxB

xxxBideal

ideal

),(

),(),,(

),,,('max

max

22 2/),( ueug

174 GOLD User Guide

11.1 The Ligand Log File (gold_ligand_m1.log)

• Ten docking runs have been set up for this ligand and, for each of these docking runs, the progress of the genetic algorithm is displayed in the GOLD Output window. This information is also recorded in the ligand log file gold_ligand_m1.log (where m1 is the index to the number of the ligand in the input file).

• Open and inspect gold_ligand_m1.log using a text editor (a section of an example ligand log file is shown):

• Following the completion of all docking runs on the ligand, the results from the different runs are compared. The end of the gold_ligand_m1.log file will include a matrix of root mean square deviations (rmsd) between the various docked ligand positions (see Section 14.10.2, page 120). A clustering report is also given which can be used to identify different binding modes (see Section 14.10.3, page 122). It is possible that fewer than the specified ten dockings were completed due to the Allow early termination option being selected (see Section 5., page 167). In the example output shown below, the solution found for docking attempt number 2 has the best

Page 66: Gold manual

GOLD User Guide 173

• Any error or warning messages produced will be displayed in a separate GA Program Error Message window (this might normally contain a number of warning messages relating to the GOLD atom type assigner). These messages can be safely ignored.

• Once the job is complete the message GA Done will appear in the GOLD Output window. The output displayed is also written to the ligand.log file but can be saved under a different filename by selecting the Save Output button.

• Dismiss the GOLD Output window by clicking on the Dismiss button.

11. Analysis of Output

11.1 The Ligand Log File (gold_ligand_m1.log) (see page 174)11.2 Fitness Function Rankings Files (ligand_m1.rnk and bestranking.lst) (see page 175)11.3 Files Containing The Docked Ligand (gold_soln_ligand_m#_n.mol2) (see page 176)

• The specified output directory (see Section 9., page 170) will contain a number of files including:

• Files containing the initialised protein and ligand (gold_protein.mol2 and gold_ligand.mol2)

• Files containing the docked ligand (gold_soln_ligand_m1_n.mol2)

• Files containing fitness function rankings (ligand_m1.rnk and bestranking.lst)

• Protein and ligand log files (gold_protein.log and gold_ligand_m1.log)

• Files containing error messages (gold.err), this file will be empty if no errors are found.

• Some of these output files will be dealt with in detail below. Further information on the content of all these output files is available (see Section 14., page 109).

52 GOLD User Guide

6.4.3 Hydrogen-Bond Terms

• The hydrogen-bond term is computed as a sum over all possible donor-acceptor pairs, such that one atom belongs to the protein and the other to the ligand.

• Each term in the summation is the product of three Gaussian-smoothed block functions (see Section 6.4.2, page 50). The purpose of the block functions is to reduce the contribution of a hydrogen bond according to how much its geometry deviates from (a) ideal H...A distance, (b) ideal D-H...A angle and (c) ideal directionality with respect to the acceptor atom. The maximum contribution of a given donor-acceptor pair to the summation is 1; this will occur if the pair form a hydrogen bond of “ideal” geometry.

• The tables below describe the various parameters in this equation, their meanings, and what they are called in the ChemScore parameter file (see Section 6.5, page 58).

D-H..A distance parameters (D= Donor, A = Acceptor)

Term Meaning Name in ChemScore file Default value

r The ideal hydrogen..acceptor (H...A) distance (in Å)

R_IDEAL 1.85

0.0

1.0

xideal xmax

),,,'*().,,,('.),,,('maxidealmaxidealrmaxidealhbond

BBrrrBG

pairs acceptor-donor all

Page 67: Gold manual

GOLD User Guide 53

r The absolute deviation of the actual H..A separation from r

Calculated for each H-bond

-

rideal The tolerance window around the H..A distance, r, within which the H-bond is regarded as ideal

DELTA_R_IDEAL 0.25

rmax The maximum possible deviation from the ideal distance; above this, the interaction is not regarded as an H-bond

DELTA_R_MAX 0.65

r The Gaussian smearing sigma associated with this term.

HBOND_R_SIGMA 0.1

D-H..A angle parameters (D= Donor, A = Acceptor)

Term Meaning Name in ChemScore file Default value

The ideal D-H..A angle (in degrees)

ALPHA_IDEAL 180.0

The absolute deviation of the actual D-H..A angle from

Calculated for each H-bond

-

ideal The tolerance window around the D-H..A angle, , within which the H-bond is regarded as ideal

DELTA_ALPHA_IDEAL 30.0

max The maximum possible deviation from the ideal D-H..A angle; above this, the interaction is not regarded as an H-bond

DELTA_ALPHA_MAX 80.0

The Gaussian smearing sigma associated with this term.

HBOND_ALPHA_SIGMA 10.0

DH..A-X acceptor-centred angle parameters (D= Donor, A = Acceptor, X = Heavy atom attached to A)

Term Meaning Name in ChemScore file Default value

The ideal H..A-X angle (in degrees)

BETA_IDEAL 180.0

172 GOLD User Guide

• Filter out all solutions with fitness scores lower than a specified value

• By default the Keep all solutions option from the Selecting Docked Solutions panel in the Output preferences window should be selected:

• Select Done to close the Output preferences window.

10. Running GOLD• The main Control panel of the GOLD front end contains a number of options, including:

• The Run button, which will start a GOLD job, and display the output to the screen untilcompletion of the job.

• Save&Exit which will save all the settings defined in the GOLD front end in a configurationfile (gold.conf) and then close the front end. The configuration file includes details of theligand, the protein binding site, the fitness-function parameter file to be used, the torsiondistribution file to be used, and the genetic algorithm parameters (see Section 15., page 126).

• Submit&Exit which will start a GOLD run in the background (and also save a configurationfile), then close the front end.

• The Configuration File button which enables the settings from a previously savedconfiguration file to be opened. This will automatically load the saved parameter values intothe front end (see Section 15., page 126).

• Click on the Run button in the GOLD front end.

• As the job progresses output will be displayed in a GOLD Output window:

Page 68: Gold manual

GOLD User Guide 171

• Ensure that the Save rnk files and Save solution log files check boxes are switched on, this will instruct GOLD to retain output files listing fitness-function rankings and ligand log files. The content of these files are discussed later (see Section 11., page 173).

• By default, docking solutions will be written out in the same format as was used for input (i.e. MOL2 format), ensure that the Same as input output file format option is selected.

• Click on the Output Directory... button and specify a directory, to which you have write permission, this is where the GOLD output files will be written.

• It is possible to write additional information to docked solution files. This information is written to SD file tags; for MOL2 files, these tags are written to comment blocks. This information is particularly important for post-processing docking results with SILVER. For the purpose of this tutorial the Information in File settings can be left at their default settings.

• GOLD can produce a large amount of output. However, it is possible to cut this down by applying output filter options. These options can be used to:

• Specify that all docking solutions are saved

• Retain only the n best docking solutions

• Save the top-ranked solution for the best m ligands only

54 GOLD User Guide

• The third block function in the H-bond equation, B´*, is the sum of all possible values for a given hydrogen bond. For example, a tertiary amine acceptor has three covalently-bound atoms that could be deemed as the “X” atom: in this case, the term added for an H-bond to the amine is the product of the block-function values for all three possible H..A-X angles.

• Hydrogen bonds have a regression coefficient associated with them, v1 (see Section 6.4.1, page

49). By default, this is set to –3.34. The name of this coefficient in the ChemScore parameter file (see Section 6.5, page 58) is HBOND_COEFFICIENT.

6.4.4 Metal-Binding and Lipophilic Terms

• The metal-binding term in ChemScore is computed as a sum over all possible metal-ion ... acceptor pairs, where the acceptor is an atom in the ligand that is capable of binding to a metal.

• Each term in the summation is a Gaussian-smoothed block function (see Section 6.4.2, page 50) whose purpose is to reduce the contribution of the metal-acceptor interaction if the geometry is not ideal.

• The table below describes the various parameters in this equation, their meanings, and what they are called in the ChemScore parameter file (see Section 6.5, page 58).

The absolute deviation of the actual H..A-X angle from

Calculated for each H-bond

-

ideal The tolerance window around the H..A-X angle, , within which the H-bond is regarded as ideal

DELTA_BETA_IDEAL 70.0

max The maximum possible deviation from the ideal H..A-X angle; above this, the interaction is not regarded as an H-bond

DELTA_BETA_MAX 80.0

The Gaussian smearing sigma associated with this term.

HBOND_BETA_SIGMA 10.0

acceptors ligand All

metals protein All

),,,(metalmaxidealaMmetal

RRrBP

Page 69: Gold manual

GOLD User Guide 55

• The metal-binding term has a regression coefficient associated with it, v2 (see Section 6.4.1,

page 49). By default, this is set to –6.03. The name of this coefficient in the ChemScore parameter file (see Section 6.5, page 58) is METAL_COEFFICIENT.

• The lipophilic term is defined in a similar way:

• The table below describes the various parameters in this equation, their meanings, and what they are called in the ChemScore parameter file (see Section 6.5, page 58).

Metal-binding parameters in ChemScore

Term Meaning Name in ChemScore file Default value

raM The actual acceptor-metal distance (in Å)

Calculated for each acceptor-metal pair

-

Rideal The ideal acceptor-metal distance METAL_R1 2.6

Rmax The maximum acceptor-metal distance to be considered a binding interaction

METAL_R2 3.0

metal The Gaussian smearing sigma associated with this term

METAL_R_SIGMA 0.1

Lipophilic parameters in ChemScore

Term Meaning Name in ChemScore file Default value

rll The actual distance between the pair of lipophilic atoms (in Å)

Calculated for each atom-atom pair

-

Rideal The ideal atom...atom distance separation

LIPO_R1 4.1

Rmax The maximum separation, beyond which no interaction is deemed to occur

LIPO_R2 7.1

lipo The Gaussian smearing sigma associated with this term

LIPO_R_SIGMA 0.1

atoms lipophilic ligand All

atoms lipophilic protein All

),,,(lipomaxideallllipo

RRrBP

170 GOLD User Guide

are shown):

• Care should be taken when altering these parameter settings and you are recommended to use one of the pre-defined parameters sets offered. Alternatively, GOLD can decide on the optimal settings to use for a given ligand (see Section 11.3, page 94).

• To enable automatic GA settings, click on the Select GA Presets and Automatic Settings button in the Genetic Algorithm Parameters panel (or hit Settings in the Control panel) then, in the Settings selector window, click on Use automatic settings. Ensure the Search efficiency is set to 100%, then hit Done.

• The criteria used by GOLD to determine the optimal GA parameter settings for a given ligand include: the number of rotatable bonds in the ligand, ligand flexibility, i.e. number of flexible ring corners, flippable nitrogens, etc., the volume of the protein binding site, and the number of water molecules considered during docking. Details of the exact settings used will be given in the ligand log file gold_ligand_m1.log (see Section 14.10, page 118).

9. Setting Output Preferences• Select the Output... button in the GOLD front end to open the Output preferences window:

Page 70: Gold manual

GOLD User Guide 169

• GoldScore is the original GOLD scoring function and is made up of four components:

• protein-ligand hydrogen bond energy (external H-bond)

• protein-ligand van der waals (vdw) energy (external vdw)

• ligand internal vdw energy (internal vdw)

• ligand torsional strain energy (internal torsion)

• It is possible to alter the empirical parameters used in the fitness function (hydrogen bond energies, atom radii and polarisabilities, torsion potentials, hydrogen bond directionalities, etc.) within the GOLD parameters file. The default GOLD parameters file (gold.params) can be found in:

• UNIX: $GOLD_DIR/gold.params

• Windows: <InstallDir>/GOLD/gold.paramswhere <InstallDir> is usually C:/Program Files/CCDC

• For the purpose of this tutorial ensure that the Parameter File entry box in the Input Parameters and Files section of the GOLD front end is set to gold.params, or DEFAULT when used for the first time.

• Torsion angle distributions, extracted from the Cambridge Structural Database (CSD), can be used to restrict the ligand conformational space sampled by the genetic algorithm. Using torsion angle distributions in this way may improve the chances of GOLD finding the correct answer by biasing the search towards ligand torsion-angle values that are commonly observed in crystal structures. It may also improve convergence and so make GOLD usable with faster settings (see Section 9.1, page 83).

• By default the use of torsion angle distributions should be enabled. Click on the Fitness & Search options button in the GOLD front end. In the resulting window ensure the check box labelled Use torsion angle distributions from the CSD is switched on.

8. Genetic Algorithm Parameter Settings• GOLD optimises the fitness score using a genetic algorithm (GA) (see Section 10., page 89).

• A number of parameters control the precise operation of the genetic algorithm. Genetic algorithm parameter settings can be specified in the GOLD front end (standard default settings

56 GOLD User Guide

• The difference between the metal and lipophilic parameterisation is that the lipophilic term is scored over a much longer range.

• Lipophilic atoms are defined as non-accepting sulphurs, non-polar carbon atoms (polar carbon atoms are carbon atoms attached to two or more polar atoms), and non-ionic chlorine, bromine and iodine atoms.

• The lipophilic term has a regression coefficient associated with it, v3 (see Section 6.4.1, page

49). By default, this is set to –0.117. The name of this coefficient in the ChemScore parameter file (see Section 6.5, page 58) is LIPO_COEFFICIENT.

6.4.5 Rotatable-Bond Freezing Term

• The following formula is used to estimate the entropic loss that occurs when single, acyclic bonds in the ligand become non-rotatable upon binding:

• Nrot is the number of frozen rotatable bonds in the ligand (a bond is considered frozen if one or

more atoms on both sides of the rotatable bond is in contact with the protein). The expression is deemed to have a value of zero if there are no rotatable bonds in the ligand.

• Pnl(r) and P’nl(r) are the percentages of non-hydrogen atoms on either side of the rotatable bond

that are not lipophilic. For example, if there are 10 non-hydrogen atoms on one side of the bond, of which 3 are not lipophilic, and there are 20 non-hydrogen atoms on the other side, of which 2 are not lipophilic, then Pnl(r) and P’nl(r) are 30% and 10%, respectively.

• The regression coefficient associated with this term, v4 (see Section 6.4.1, page 49), has the

default value 2.56. The name of this coefficient in the ChemScore parameter file (see Section 6.5, page 58) is ROT_COEFFICIENT.

6.4.6 Clash Penalty and Internal Torsion Terms

• Clashes between protein and ligand atoms and ligand internal torsional strain are accommodated by penalty terms.

• These terms are included to prevent poor geometries in docking.

• The clash penalty terms in ChemScore differ on the nature of the contact, i.e. whether it is a hydrogen-bonding contact, a metal-binding contact or neither of these.

• Any hydrogen bond with an H...A distance shorter than rhbond Å contributes a clash term of:

r

nlnl

rot

rot

rPrP

Np

2

))(')(()

11(1

Page 71: Gold manual

GOLD User Guide 57

• The value of rhbond (default = 1.6Å) can be changed by altering the parameter

CLASH_RADIUS_HBOND in the ChemScore file (see Section 6.5, page 58).

• Any metal coordination contact shorter than rmetal Å contributes a clash term of:

• The value of rmetal (default = 1.3*Å) can be changed by altering the parameter

CLASH_RADIUS_METAL in the ChemScore file (see Section 6.5, page 58).

• All other ligand-protein interatomic contacts contribute clash terms of the following form:

• rclash varies with contact type: for contacts to protein sulphur atoms, it is set to 3.35Å; for all

other contacts, it is set to 3.10Å. These settings correspond to the parameters CLASH_RADIUS_SULPHUR and CLASH_RADIUS_GENERAL in the ChemScore file (see Section 6.5, page 58).

• Internal ligand strain is accommodated by clash terms in combination with torsional strain terms of the form:

hbondhbond

hbondhbondclash rG

rrP

0.20

metalmetal

metalmetalclash rG

rrP

0.20

clash

clashotherclash r

rrP

0.40.1

bonds rotatable All

)cos(10

nAPiinternal

168 GOLD User Guide

• The orthogonal x, y, z coordinates of a solvent accessible point approximately at the centre of the active site should be entered. The centre of the binding site in 1acm has already been centred over the origin, so in this case the coordinates can be left as 0.0, 0.0, 0.0.

• The approximate radius of the binding site must also be specified. By default the binding site radius is set to 10.0 Å, ensure that this is the case. This radius should be large enough to contain any possible binding mode of the N-phosphonacetyl-L-aspartate ligand.

• A cavity detection algorithm, LIGSITE, is used to restrict the region of interest to concave, solvent-accessible surfaces. Ensure that cavity detection is enabled by switching on the button labelled Detect Cavity:

7. Fitness Function and Search Settings• During a docking run the solutions found by GOLD are scored according to a fitness function

(see Section 6., page 46).

• GOLD offers a choice of three fitness functions, GoldScore (see Section 6.2, page 46), ChemScore (see Section 6.4, page 49) and User Defined Score (see Section 6.10, page 62).

• The User Defined Score allows you to modify existing scoring functions, or to implement a completely new scoring function using an Applications Programming Interface (API). A good knowledge of the C programming language is required together with some experience in using GOLD. Full documentation for the GOLD Scoring Function API is provided:

• UNIX: $GOLD_DIR/gold/api_doc/index.html

• Windows: <InstallDir>/GOLD/gold/api_doc/index.htmlwhere <InstallDir> is usually C:/Program Files/CCDC

• Ensure that the default GoldScore scoring function is selected within the Fitness Function and Search Settings panel of the GOLD front end (see Section 6., page 46):

Page 72: Gold manual

GOLD User Guide 167

• Add single ligands

• Select a complete directory of ligand files.

• Specify a single file containing several ligands (i.e. a multi-MOL2 or SD file).

• Click on the Filename button and select ligand.mol2 from <GOLD_DIR>/examples/tutorial1.

• The number of dockings to be performed on each ligand is specified by entering a value for the No. of GA runs. By default this should be set to ten, if not set the number of docking runs to ten.

• Click on Add file or Update selected file, the filename of the selected ligand and the number of dockings are now displayed in the Current Ligand File Selection list. Hit Done to close the Ligand selection for docking run window.

5. Input Parameters and Files Settings• The specified protein input file should be displayed within the Input Parameters and Files panel

of the GOLD front end, and the Ligands Count should be displayed as 1.

• By default the Set atom types check button for the Ligand only should be switched on in the Input Parameters and Files panel, further information on atom type assignment is provided (see Section 5.1, page 36). If this is not the case, then enable the Set atom types option for the Ligand.

• By default the Allow early termination check box should be switched on and contain the following early termination criteria:

• This will instruct GOLD to terminate the docking if, at any point, the best three solutions found are all within 1.5 Å rmsd of each other. In this case, it is probable that the answer is correct and further docking runs will not be required.

6. Defining the Ligand Binding Site• It is necessary to specify the approximate centre and extent of the protein binding site, this can

be done in a number of ways, including:

• from a point (see Section 3.8.1, page 25);

• from a protein atom (see Section 3.8.2, page 25);

• from a file containing a list of atoms (see Section 3.8.3, page 26);

• from a protein residue (see Section 3.8.4, page 26);

• from a file containing a list of residues (see Section 3.8.5, page 27);

• from a reference ligand (see Section 3.8.6, page 28).

• For this example, switch on the button labelled Point in the GOLD front end:

58 GOLD User Guide

• Bonds are deemed to be rotatable if they are single and acyclic and involve pairs of atoms with hybridisation states sp3-sp3, sp3-sp2 or sp2-sp2.

• The parameters A, n and in the above equation are set in the ChemScore file (see Section 6.5, page 58). The relevant lines are SP3_SP3_BOND, SP3_SP2_BOND, SP2_SP2_BOND and UNKNOWN_BOND. The syntax is of the form:

SP3_SP3_BOND A n 0

For example:

SP3_SP3_BOND 0.18750 3.0 3.1515926

• The overall contribution of intramolecular strain to the scoring function is scaled by the coefficient called INTRA_COEFFICIENT in the ChemScore file (see Section 6.5, page 58)

6.4.7 Covalent Term

• When covalent bonding is switched on (see Section 4.6, page 33) the ChemScore function is modified in the following ways:

• The clash term (see Section 6.4.6, page 56) is reduced so that no clash is registered for 1-2 or1-3 contacts around the link atoms in the protein and ligand.

• Torsion terms (see Section 6.4.6, page 56) are added for the rotatable parts of the linkage.

• A valence-angle bending term is added to the overall energy to penalize poor link geometries.

• The weight of the covalent link energy in the ChemScore function is controlled by the parameter called LINK_BEND_COEFFICIENT in the ChemScore parameter file (see Section 6.5, page 58).

6.4.8 Constraint Terms

• Constraints (see Section 8., page 68) are implemented in ChemScore in the same way as they are in GoldScore.

6.5 Altering ChemScore Fitness-Function Parameters; the ChemScore File

• The ChemScore parameter file is stored in the GOLD distribution directory. It contains all the parameters used by the GOLD implementation of ChemScore. A full description of the meaning of the various parameters is given elsewhere (see Section 6.4, page 49).

• The ChemScore file can be customised by copying it, editing the copy, and instructing GOLD to use the edited file.

• A copy of the default file will be placed in your current directory (where it will be called chemscore.params) if you click on the ChemScore File button in the GOLD front end.

Page 73: Gold manual

GOLD User Guide 59

• The entry box next to the ChemScore File button in the GOLD front end should say DEFAULT if you want to use the default ChemScore parameter file. If you want to use a customised version of the file, click on the ChemScore File button to select the required file or directly type the file name into the entry box.

• The format of the ChemScore file is quite strict: incorrect editing may cause GOLD to behave in unexpected ways or even to crash. Because of the large number of parameters, no guarantee can be given that the program will behave reliably with anything other than the default parameterisation.

6.6 Altering GOLD Parameters: the gold.params File

• The parameter file gold.params is stored in the GOLD distribution directory. It contains all of the parameters used by GOLD (e.g. hydrogen bond energies, atom radii and polarisabilities, torsion potentials, hydrogen bond directionalities, etc.) other than those which are specified in the configuration file (i.e. can be set via the GOLD front end).

• It also contains parameters that control the general behaviour of GOLD, e.g. whether the final solution from a genetic algorithm run is to be minimised via a Simplex procedure before being saved.

• The parameter file can be customised by copying it, editing the copy, and instructing GOLD to use the edited file.

• Click on the Edit Parameters button to edit the parameter file. If the parameter file is set to DEFAULT then the standard GOLD distribution parameter file is copied to the current directory.

• GOLD gets the location of the parameter file from the configuration file line param_file = <parameter file location>. This is most easily defined using the Parameter File button in the front end.

• The Parameter File entry box in the GOLD front end should say DEFAULT if you want to use the default GOLD parameter file. You can click on the button to pick an alternative parameter file, or directly type a file name into the entry box.

• The format of the parameter file is quite strict: incorrect editing may cause GOLD to behave in unexpected ways or even to crash. Because of the large number of parameters, no guarantee can be given that the program will behave reliably with anything other than the default parameterisation.

• For more information see the comments in the parameter file, gold.params.

6.7 Kinase Scoring Function

• Weak CH..O interactions can be accounted for by inclusion of a Chemscore term that calculates a contribution for weak hydrogen bonds. This term can be useful when dealing with particular proteins, e.g. most kinases contain weak N-heterocycle CH...O hydrogen bonds.

• This term can be enabled by editing the chemscore.params file (see Section 6.5, page 58). The

166 GOLD User Guide

• The ligand has been minimised into a low-energy starting conformation and the atom types have been checked for accuracy (see Section 4.3, page 31).

2.3 Atom Type Assignment

• Each protein and ligand atom must be assigned an atom type which is used to determine whether the atom is capable of forming hydrogen bonds. GOLD atom typing is based on SYBYL (http://www.tripos.com/) atom types. SYBYL bond types are also used.

• GOLD will automatically assign atom types provided the Set atom types check buttons are switched on in the Input Parameters and Files panel of the GOLD front end.

• GOLD deduces atom types from the information about element types and bond orders in the input structure file, it is therefore crucial that both the protein and ligand input files are prepared according to the guidelines provided (see Section 5.1, page 36).

3. Specifying the Protein Input File• Open GOLD and click on the Protein button in the Input Parameters and Files section of the

front end to bring up the file selection window.

• Select protein.mol2 from <GOLD_DIR>/examples/tutorial1, then click on Open.

4. Specifying the Ligand Input File• Click on the Edit Ligand File List button in the GOLD front-end. The Ligand selection for

docking run window will appear:

• From here it is possible to:

Page 74: Gold manual

GOLD User Guide 165

• All other parts of the protein will be kept rigid, so the only way of dealing with a truly flexible binding site is to perform separate GOLD runs on different binding-site conformations.

2.2 Preparing the Ligand Input File

• The N-phosphonacetyl-L-aspartate ligand has already been prepared in accordance with the requirements for setting up the ligand (see Section 4., page 30).

• Within SILVER read in the file ligand.mol2 from <GOLD_DIR>/examples/tutorial1 and inspect the structure:

• Acceptable ligand input file formats are MOL2 (i.e. Tripos format) or MOL (i.e. MDL SD format), PDB files can also be used, although we do not recommend the use of PDB format for ligands (see Section 4.4, page 31).

• All hydrogen atoms must be present in the ligand input file (see Section 4.2, page 30). In this example, all hydrogen atoms have been added thus ensuring that the ionisation and tautomeric states are defined unambiguously.

• Certain groups can be represented in more than one way (i.e. have more than one canonical form), such as nitro, carboxylate and amidinium. In such cases, there is usually a right and a wrong representation for use in GOLD. The conventions used for some common difficult groups and further help on setting up the ligand is provided (see Section 4., page 30).

60 GOLD User Guide

following parameters are used:

# CH...O PARAMETERS # ================CHO_COEFFICIENT -2.00# OFF no CHO term# SPECIAL only CH adjacent to heteroatoms# ARO all aromatic CH CHO_TYPE OFF#CHO_TYPE SPECIALCHO_R_IDEAL 2.35CHO_DELTA_R_IDEAL 0.25CHO_DELTA_R_MAX 0.65CHO_ALPHA_IDEAL 180.0CHO_DELTA_ALPHA_IDEAL 50.0CHO_DELTA_ALPHA_MAX 100.0CHO_BETA_IDEAL 180.0CHO_DELTA_BETA_IDEAL 70.0CHO_DELTA_BETA_MAX 80.0

• To enable calculation of a weak CH...O hydrogen bonding term S(cho) the term CHO_TYPE should be set to SPECIAL. This will enable the recognition of activated CH groups for hydrogen bonding. Active CH groups are those in aromatic rings next to nitrogens (e.g. the CH's in an imidazole ring). These groups are recognised both in the ligand and protein active site.

• For further details please refer to Virtual Screening Using Protein-Ligand Docking: Avoiding Artificial Enrichment (see Section 19., page 147).

6.8 Heme Scoring Function

• The heme scoring function is available for both GoldScore (see Section 6.2, page 46) and ChemScore (see Section 6.4, page 49).

• By default GOLD makes no distinction between different H-bond acceptors in terms of their strength of interaction with the metal. A recent publication by Kirton et al (S. B. Kirton, C. W. Murray, M. L. Verdonk and R. D. Taylor, Proteins: Structure, Function, and Bioinformatics, 58, 836-844, 2005) demonstrated how metal parameters can be set up in GOLD for both GoldScore and ChemScore, to take account of different H-bond acceptor types. Kirton et al described the use of ligand specific iron parameters in the context of docking to heme containing proteins and demonstrated improved performance. It is now possible in GOLD to optionally use these parameters.

• The parameters are derived from contact statistics obtained from the CSD and PDB databases.

Page 75: Gold manual

GOLD User Guide 61

Parameters were derived for both GoldScore and ChemScore.

• These parameters can be used by choosing the appropriate .params file from those that have been supplied with the GOLD installation. The .params files that are available are:

• goldscore.p450_csd.params

• goldscore.p450_pdb.params

• chemscore.p450_csd.params

• chemscore.p450.pdb.params

• The files are located within the $GOLD_DIR/gold directory. The graphic below shows the iron parameters for GoldScore, derived from the CSD, as displayed in the goldscore.p450_csd.params file.

• To employ one of the files, click on either the GoldScore Parameter File: button (if using GoldScore) or the ChemScore Parameter File: button (if using ChemScore), navigate to the $GOLD_DIR/gold, select the file required then click on Open.

• It was found necessary by Kirton et al to assign the planar nitrogens in the heme molecules as lipophilic when using the ChemScore scoring function. In order to bring this about the chemscore.p450 parameter files therefore contain the additional keyword:

MAKE_PLANAR_N_LIPO 1

NOTE: Use of this keyword has only been validated for nitrogen atoms within heme containing proteins. Improvements in docking performance when used with non-heme containing proteins are not guaranteed.

164 GOLD User Guide

• Acceptable protein input file formats for GOLD are PDB and MOL2.

• The protein input file may be the entire protein structure, or consist of just those residues that are in the region of the ligand binding site. GOLD searches for contacts out to a distance of 20.0 Å. In this example, parts of the protein remote from the binding site have been deleted, in order to speed up the calculation. The protein has been cut down to a radius of 20.0Å around the ligand binding site thus ensuring that enough of the protein has been retained so that all of the residues that might reasonably interact with the ligand are present.

• All hydrogen atoms must be present in the protein input file (see Section 3.2, page 10). In this example, hydrogen atoms have been placed on the protein (using a molecular modelling program (see Section 1., page 1) in order to ensure that ionisation and tautomeric states are defined unambiguously. Obviously, this involved making hypotheses about the protonation states of residues such as His, Glu and Asp.

• GOLD allows for partial protein flexibility. Specifically, the torsion angles of Ser, Thr and Tyr hydroxyl groups will be allowed to rotate during docking in order to optimise their hydrogen-

bonding to the ligand. Lysine NH3+ groups are similarly optimised.

Note: the optimised positions of polar protein hydrogen atoms that are generated during docking (these will usually be different for each docked ligand pose) can be saved to the docked solution file (see Section 14.2, page 111)

Page 76: Gold manual

GOLD User Guide 163

Tutorial 1: A Step-By-Step Guide to Using GOLD

1. Introduction (see page 163)2. Preparation of Input Structures for Use in GOLD (see page 163)3. Specifying the Protein Input File (see page 166)4. Specifying the Ligand Input File (see page 166)5. Input Parameters and Files Settings (see page 167)6. Defining the Ligand Binding Site (see page 167)7. Fitness Function and Search Settings (see page 168)8. Genetic Algorithm Parameter Settings (see page 169)9. Setting Output Preferences (see page 170)10. Running GOLD (see page 172)11. Analysis of Output (see page 173)

1. Introduction• This tutorial aims to provide a step-by-step guide to using GOLD. To illustrate this, the

procedure for setting-up and running an example docking will be explained and additional information will be provided on related issues.

• In this example GOLD will be used to determine the binding mode of N-phosphonacetyl-L-aspartate with the aspartate carbamoyltransferase, PDB entry code 1acm.

2. Preparation of Input Structures for Use in GOLD

2.1 Preparing the Protein Input File (see page 163)2.2 Preparing the Ligand Input File (see page 165)2.3 Atom Type Assignment (see page 166)

• GOLD will only produce reliable results if the protein and ligand input files are set up correctly. It is therefore essential that a number of key steps are followed when preparing any input structure for use in GOLD ((see Section 3.1, page 9) and (see Section 4.1, page 30)).

2.1 Preparing the Protein Input File

• The aspartate carbamoyltransferase, 1acm, has already been prepared in accordance with the requirements for setting up the protein (see Section 3.1, page 9).

• Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/tutorial1 and inspect the structure:

62 GOLD User Guide

6.9 Internal Energy Offset

• Click on the Fitness & Search Options button and switch on the Offset internal ligand energy by best energy that is encountered during run check-box.

• Enabling this option will result in the internal energy terms (internal torsion, internal vdw, and internal Hbond) being corrected according to the best energy encountered for these terms during the run.

• By applying this correction the internal energy will be calculated with respect to that of a close to optimal non-bound structure, thereby taking into account any irreducible internal energy.

• The internal energy offset can be used with both Goldscore and Chemscore. For Chemscore the ligand energy correction value is written to the docked solution files in the tag <Gold.Chemscore.Internal.Correction>. This is the best (i.e. minimum energy) value encountered. For GoldScore the correction value is written to the docked solution files in the tag <Gold.Goldscore.Internal.Correction>. This is the best score (ie. the maximum value) encountered.In both cases, best value encountered is subtracted from the ligand score (or energy) value before being passing to the final GOLDscore or Chemscore-energy term. Note: The final Chemscore-energy is converted to Chemscore-score by taking the negative.

• Note: The .rnk file is corrected at the end of a run with the best energy encountered after all docking attempts on a particular ligand (individual solution files are not). Therefore you may observe small deviations for the best energy found between the solutions and rank file. Increasing the number of dockings or the number of GA operations in each docking will result in the discrepancy being less pronounced.

6.10 User Defined Fitness Function

• In addition to the choice of scoring functions currently provided, i.e., GoldScore and ChemScore, users can now implement their own scoring function, which can be accessed from the GOLD front end by selecting User Defined Score:

Page 77: Gold manual

GOLD User Guide 63

• The GOLD scoring function Application Programming Interface (API) allows users to modify the GOLD scoring-function mechanism in order to:

• Calculate and write out additional data after each docking

• Add extra terms to the scoring function

• Implement a completely new scoring function

• Full documentation for the GOLD Scoring Function Application Programming Interface (API) is provided with the GOLD distribution:

• UNIX: $GOLD_DIR/gold/api_doc/index.html

• Windows: <InstallDir>/GOLD/gold/api_doc/index.htmlwhere <InstallDir> is usually C:/Program Files/CCDC

• see: GOLD Scoring Function Application Programming Interface (API) documentation.

• A good knowledge of the C programming language is required together with some experience in using GOLD.

• Selecting Scoring Function Shared Object Name (UNIX) or Scoring Function DLL Name (Windows) from the Fitness Function Settings panel enables you to specify a path to a dynamically loadable shared object library.

• GOLD uses shared objects (or dynamically loadable libraries) to allow new or modified scoring functions to be plugged in. Two shared object files are relevant:

• The main GOLD shared object, which is called libgold.so (UNIX) or gold.dll(Windows)

• The scoring-function shared objects which, by default, are called libfitfunc_dll.so(UNIX), goldscore.dll or chemscore.dll(Windows)

• On UNIX the file libgold.so is included in the GOLD distribution, together with two versions of libfitfunc_dll.so, one implementing the normal GOLD scoring function and the other implementing the ChemScore function.

• On Windows the file gold.dll is included in the GOLD distribution, together with two files called goldscore.dll, for implementing the normal GOLD scoring function, and chemscore.dll, for implementing the ChemScore function.

• It effectively provides a mechanism by which data may be intercepted and modified during docking. Users may therefore post-process the results of a docking, or modify the GOLD function, or implement their own scoring function, by building their own versions of libfitfunc_dll.so (UNIX) or, e.g. goldscore.dll (Windows).

162 GOLD User Guide

Appendix E: GOLD Tutorials

• In order to familiarise yourself with GOLD it is recommended that you work through the tutorial examples provided. Tutorial 1 will go through the process of setting up and running an example docking in some detail, subsequent tutorials will be more concise but will introduce other, more advanced, aspects of the program.

• For the purpose of these tutorials it is assumed that the user has access to either SILVER (supplied with GOLD) or another visualisation program (for instructions on how to use SILVER refer to the SILVER User Guide). In addition, if you wish to set up your own protein and ligand input files ((see Section 3.1, page 9) and (see Section 4.1, page 30)) then you will need access to a molecular modelling program. Full details of the software requirements needed in order to use GOLD are given elsewhere (see Section 1., page 1).

• Please note: due to the non-deterministic nature of GOLD results may vary from those described in the tutorials.

Tutorial 1: A Step-By-Step Guide to Using GOLD (see page 163) Tutorial 2: Handling of Metals in GOLD (see page 178) Tutorial 3: Use of Hydrogen Bonding Constraints (see page 185) Tutorial 4: Use of Substructure Based Distance Constraints (see page 194) Tutorial 5: Docking with Water in the Binding Site (see page 202) Tutorial 6: Docking with a Flexible Side Chain (see page 208) Tutorial 7: Docking using Localised Soft Potentials (see page 215)

Page 78: Gold manual

GOLD User Guide 161

Correlation of prediction quality with number of flexible torsions in ligand

Errors or Wrong 53.9 38.5 3.6

Prediction Result Max Avg Min

Good or Close 24 9.0 0

Errors or Wrong 14 8.4 3

64 GOLD User Guide

7. Ligand Flexibility

7.1 Flipping Ring Corners (see page 64)7.2 Flipping Amide Bonds (see page 64)7.3 Flipping Planar Nitrogens (see page 65)7.4 Flipping Pyramidal Nitrogens (see page 66)7.5 Intramolecular Hydrogen Bonds (see page 66)7.6 Protonated Carboxylic Acids (see page 66)7.7 Fixing Rotatable Bonds at Their Input Conformation (see page 66)

7.1 Flipping Ring Corners

• Click on the Fitness & Search Options button and switch on the Flip ring corners check-box to allow free corners of ligand rings to flip. This will result in GOLD performing a limited conformational search of cyclic systems by allowing free corners of rings to flip above or below the plane of their neighbouring atoms.

• If the Flip ring corners check box is not switched on then rings will be held rigid at the input conformation during docking.

• The rules govening flipping of ring corners in GOLD are given in:A. W. R. Payne and R. C. Glen, J. Mol. Graphics, 1993, 10, 74-91

7.2 Flipping Amide Bonds

• During initialisation of the ligand amides (including thioamides, ureas, and thioureas) will be set to the trans conformation.

• Click on the Fitness & Search Options button and switch on the Flip amide bonds check box to allow amides, thioamides, ureas, and thioureas in the ligand to flip between cis and trans.

• In order to flip between cis and trans conformations the CO-NRR' torsion is first made planar (at the initialised trans conformation).Note: N,N disubstituted amides are not made planar; CO-NH2 will be set so that the NH2 group

is in plane with the CO (care must be taken that the input RNH2 group itself is planar since

GOLD will not change this).

• On occasion this flattening of the CO-NRR' torsion may result in clashes in the initialised structure. If this occurs, it is advisable to turn off normalisation of amide bonds using the FLATTEN_BONDS keyword in the gold.params file. In this case it is recommended to fix the bond by switching off Flip amide bonds, or by explicitly specifying that the appropriate rotatable bonds are held at their input conformation (see Section 7.7, page 66).

• If the use of torsion angle distribution has been enabled (see Section 9., page 83) GOLD will attempt to match amide torsions against the torsion angles distributions file. If an amide torsion matches, this will override the Flip amide bonds flag setting.

• Note: Data in the CSD show that both cis and trans conformations occur in ureas, it is therefore

Page 79: Gold manual

GOLD User Guide 65

recommended that amide flipping be turned on in order to sample R-N-C(O)-N torsions of 0 degrees when docking ureas.

7.3 Flipping Planar Nitrogens

• Click on the Fitness & Search Options button and switch on the Flip all planar R-NR1R2 check box to allow planar trigonal nitrogens in the ligand (bound to sp2 carbons) to flip between cis and trans conformations during docking (otherwise, they will be held fixed at the input geometry).

• It is possible to further specify whether or not ring-NHR and ring-NRR' groups are also allowed to flip (i.e. rotate 180 deg.).

• When running GOLD from the command line a number of keyword modifiers can be specified after the flip_planar_n command in the gold.conf file:

flip_planar_n = <1|0> <keyword>

These keywords allow further control over the behaviour of this flag. The following keywords can be used:

flip_ring_NRRflip_ring_NHR

This allows flipping of ring-NHR or ring-NRR’ groups and is equivalent to using the including ring-NHR and including ring-NRR’ settings in the interface.

fix_ring_NRRfix_ring_NHR

This fixes these bonds at their input conformation and is equivalent to using the do not flip ring-NHR and do not flip ring-NRR’ settings in the interface.

rot_ring_NRRrot_ring_NHR

Use these keywords to allow free rotation of ring-NHR or ring-NRR’ groups.

• For example, setting flip_planar_n = 1 fix_ring_NRR will allow all planar R3N

groups to flip, but will fix ring-NRR’ groups.

160 GOLD User Guide

Appendix D: GOLD Predictions in Second Series of Validation Tests

• 3D plots of individual predictions are available on the CCDC web page.

• The tables in this section list:

• Subjective classification of GOLD predictions (see page 153)

• Correlation of prediction quality with number of heavy atoms in ligand (see page 158)

• Correlation of prediction quality with percentage of heavy atoms in ligand that can formhydrogen bonds (see page 158)

• Correlation of prediction quality with number of flexible torsions in ligand (see page 158)

Subjective classification of GOLD predictions

Correlation of prediction quality with number of heavy atoms in ligand

Correlation of prediction quality with percentage of heavy atoms in ligand that can form hydrogen

bonds

Subjective Result No. PDB Codes

Good 12 1BMA 1CIL 1FRP 2GBP 1GLP 1LAH 1LPM 1MMQ 1MRG 1TRK 1TNL 1WAP

Close 13 1ATL 1BBP 1BYB 1CBS 1COM 1FEN 1HFC 1IMB 1LCP 1NCO 1TNG 1TNI 1TPH

Some significant errors 6 2CMD 1CTR 2LGS 1LNA 1SNC 1UKZ

Wrong 3 1CDG 1LMO 1TYL

Prediction Result Max Avg Min

Good or Close 48 21.2 8

Errors or Wrong 29 19.9 10

Prediction Result Max Avg Min

Good or Close 60.0 29.5 0.0

Page 80: Gold manual

GOLD User Guide 159

Correlation of prediction quality with protein resolution

Resolution (Å) Total No. Good + No. Close

No. Errors + No. Wrong

> 1.0, <= 1.5 2 2 0

> 1.5, <= 2.0 44 34 10

> 2.0, <= 2.5 32 24 8

> 2.5, <= 3.0 20 11 9

> 3.0 1 0 1

66 GOLD User Guide

7.4 Flipping Pyramidal Nitrogens

• Click on the Fitness & Search Options button and switch on the Flip pyramidal N check box to allow pyramidal (i.e. non-planar sp3) nitrogens to invert during docking (otherwise, they will be held fixed at the input geometry).

• Given a non-planar group RR’R”N or tetrahedrally surrounded RR’R”NH, the Flip pyramidal N switch enables flipping of the local stereochemistry around the nitrogen (the energy barrier for this umbrella-like change of geometry around the nitrogen is low).

• Flipping only changes the stereochemistry around RR’R”N and RR’R”NH nitrogens. It does not affect other chiral centers.

7.5 Intramolecular Hydrogen Bonds

• Click on the Fitness & Search Options button and switch on the Internal H-bonds check box to allow intramolecular hydrogen bonds in the ligand to be formed during docking.

• Use this with care as it can make ligands like methotrexate curl up.

7.6 Protonated Carboxylic Acids

• Click on the Fitness & Search Options button and switch on the Protonated carboxylic acids check box. Protonated carboxylic acids can then either be allowed to flip (i.e. rotate 180 deg.) or rotate freely during docking.

• If the Protonated carboxylic acids check box is not switched on then these groups will be held rigid at their input conformation.

7.7 Fixing Rotatable Bonds at Their Input Conformation

• GOLD was designed to dock flexible ligands into protein binding sites. However, sometimes it can be useful to fix the geometry of part or all of the ligand e.g. in order to study the possible binding of a pre-determined ligand geometry.

• The ability to fix rotatable bonds at their input conformation is not available from the GOLD front end. To do this, you need to edit the gold.conf file (see Section 15.1, page 126). The following options are available:

• To fix the rotatable bond between two specified atoms, add the following line to thegold.conf file:

fix_rotatable_bond = <atom number 1> <atom number 2>

(numbering as in the input file). Note: The ability to fix rotatable bonds at their input conformation is also available using therotatable_bond_override.mol2 file (see Section 5.4, page 38). This is particularly useful if

Page 81: Gold manual

GOLD User Guide 67

docking a library of ligands that have a common substructure rather than the method abovewhich is more suitable when docking an individual ligand.

• To fix all rotatable bonds in the ligand at their input conformation, add the following line tothe gold.conf file:

fix_rotatable_bond = all

• To fix all non-terminal rotatable bonds (i.e. not -CH3, -OH, etc.), add the following line to the

gold.conf file:

fix_rotatable_bond = all_but_terminal

• Note: When fixing all rotatable bonds at their input conformation (i.e. performing a rigid ligand docking) GOLD will try to find the best orientation of the ligand in the binding site by mapping donor-acceptor (as well as hydrophobic-hydrophobic) fitting points. However, GOLD will not perform a local optimisation (simplex) on the final solution. This may lead to penalisation of near-optimal conformations. Performing a few cycles of molecular-mechanics minimisation before docking may help to take the ligand close to its local potential-energy minimum.

158 GOLD User Guide

Correlation of subjective classification with rms deviation

Correlation of prediction quality with number of heavy atoms in ligand

Correlation of prediction quality with percentage of heavy atoms in ligand that can form hydrogen

bonds

Correlation of prediction quality with number of flexible torsions in ligand

Rms Devn. (Å) Total No. No. Good No. Close No. Errors No. Wrong

<= 0.5 8 8 0 0 0

> 0.5, <= 1.0 27 24 3 0 0

> 1.0, <= 1.5 20 7 13 0 0

> 1.5, <= 2.0 11 2 9 0 0

> 2.0, <= 2.5 2 0 2 0 0

> 2.5, <= 3.0 3 0 2 1 0

> 3.0 28 0 1 8 19

Prediction Result Max Avg Min

Good or Close 52 20.4 6

Errors or Wrong 55 24.3 9

Prediction Result Max Avg Min

Good or Close 66.7 31.9 8.7

Errors or Wrong 53.9 25.1 4.8

Prediction Result Max Avg Min

Good or Close 28 7.9 0

Errors or Wrong 40 11.4 0

Page 82: Gold manual

GOLD User Guide 157

1ETR 4.23 1.55 5.65 12.81 Errors

1NIS 4.29 3.49 3.99 4.31 Wrong

2MCP 4.37 2.45 4.43 8.26 Wrong

6RSA 4.42 4.29 4.50 5.24 Errors

1RDS 4.78 1.49 6.00 11.00 Errors

1ACK 4.99 3.82 4.95 10.10 Errors

2AK3 5.08 2.41 5.43 10.20 Wrong

3CLA 5.45 2.22 5.59 6.88 Wrong

4FAB 5.69 1.24 3.60 6.69 Wrong

1BAF 6.12 4.96 5.76 6.17 Errors

1MCR 6.23 3.40 5.32 6.73 Wrong

2RO7 8.23 8.23 11.32 17.12 Wrong

1ICN 8.63 4.14 9.92 16.98 Wrong

1IGJ 9.42 9.08 10.43 13.21 Wrong

2MTH 10.12 0.90 4.65 10.12 Wrong

1TDB 10.48 4.47 8.57 12.06 Wrong

1HDC 10.49 1.65 10.64 13.50 Errors

1LIC 10.78 6.32 12.88 15.65 Errors

1ETA 11.21 7.19 9.69 12.84 Wrong

1IDA 12.12 1.41 6.84 14.43 Close

1EED 12.43 2.87 10.06 13.78 Wrong

1AAQ 12.85 1.52 7.04 15.35 Wrong

2PLV 13.92 9.11 12.65 16.21 Wrong

1HRI 14.01 11.70 14.40 16.97 Wrong

PDB Code Rms Devn. of Top-Ranked Solution

Rms Devn. of Closest Solution

Average Rms Devn. of All Solutions

Rms Devn. of Worst Solution

Subjective Rating

68 GOLD User Guide

8. Setting and Releasing Constraints

8.1 Using the Constraint Editor (see page 68)8.2 Distance Constraints (see page 69)8.3 Hydrogen Bond Constraints (see page 73)8.4 Region (Hydrophobic) Constraints (see page 77)8.5 Template Similarity Constraints (see page 79)8.6 Scaffold Match Constraint (see page 80)

8.1 Using the Constraint Editor

• Click on the Edit Constraints button within the Fitness Function and Search Settings panel of the GOLD front end. This will open the Constraints Editor:

• To define a constraint, select a constraint type from those listed and specify the required settings. The following constraint types are available:

• Distance constraint, for use with individual ligands (see Section 8.2, page 69).

Page 83: Gold manual

GOLD User Guide 69

• Substructure based distance constraint, for use with multiple ligands that have a commonsubstructure or functional group (see Section 8.2, page 69).

• Hydrogen bond constraint, for specifying a hydrogen bond between a particular ligand atomand a particular atom in the protein (see Section 8.3, page 73).

• Protein hydrogen bond constraint, for specifying that a particular protein atom should behydrogen-bonded to the ligand, but without specifying to which ligand atom (see Section 8.3,page 73).

• Region (hydrophobic) constraint, for biasing the docking towards solutions in whichparticular regions of the binding site are occupied by specific ligand atoms or types of ligandatom (see Section 8.4, page 77).

• Template similarity constraint, for biasing the conformation of docked ligands towards agiven solution, or template (see Section 8.5, page 79).

• Once the settings for a constraint have been specified click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints.

• Repeat the above procedure if you want to specify additional constraints.

• To edit a constraint highlight the corresponding entry in the Current Constraints list, make the required change and then hit the Add constraint or Update selected constraint button.

• To remove a constraint from the Current Constraints list highlight the entry and hit the Delete Selection button, or to remove all entries hit the Clear List button.

• It is possible to instruct GOLD not to dock ligands when the specified constraint is physically impossible to satisfy (e.g. if no suitable group is present in the ligand to form the required H-bond constraint). This is done by selecting the Never dock a ligand when a constraints is physically impossible check box in the Constraint Editor.

• Click on Done in the Constraints Setup window when you are satisfied with the constraints specified. The count of Constraints will be updated in the GOLD front end.

• Note: When using constraints GOLD will be biased towards finding solutions in which the specified constraint is satisfied. However, it is important to remember that such a solution is not guaranteed (i.e. it is not possible to force a constraint to be satisfied in the final solution).

8.2 Distance Constraints

• Any distance between a ligand and protein atom (or between two ligand atoms) can be constrained to lie between minimum and maximum distance bounds. GOLD features two types of distance constraint:

• A standard distance constraint for use with individual ligands (see Section 8.2.1, page 70).

• A substructure-based distance constraint for use with multiple ligands which have a commonfunctional group (see Section 8.2.3, page 72).

156 GOLD User Guide

1GLQ 1.35 0.97 3.77 9.47 Close

1PHG 1.35 1.35 3.57 4.59 Close

4EST 1.38 1.04 2.76 4.96 Close

1DRI 1.41 1.04 1.35 1.43 Close

4DFR 1.44 0.80 3.98 10.85 Good

1GHB 1.45 1.22 2.59 4.80 Close

5P2P 1.55 1.24 6.15 11.69 Close

4CTS 1.57 1.56 1.57 1.61 Close

3CPA 1.58 0.90 1.47 1.89 Close

1APT 1.62 1.62 6.50 9.97 Close

1TMN 1.68 1.46 5.25 10.61 Close

1DWD 1.71 1.71 6.50 9.56 Close

1FKG 1.81 1.67 6.26 11.32 Good

1HEF 1.87 1.87 10.01 14.04 Good

1TKA 1.88 0.86 2.54 5.09 Close

1BLH 1.95 0.53 1.60 2.31 Close

1RNE 2.00 1.79 6.70 10.90 Close

1EPB 2.08 2.03 6.50 12.91 Close

1IVE 2.16 1.23 2.05 2.17 Close

1AZM 2.52 2.25 2.46 2.56 Close

3GCH 2.64 1.67 1.99 2.64 Close

1EAP 3.00 1.33 3.78 10.48 Errors

1DID 3.72 0.51 3.59 5.88 Wrong

1ROB 3.75 0.80 3.83 7.43 Errors

1MUP 3.96 3.41 4.10 4.58 Wrong

1ACJ 4.00 0.23 3.73 5.52 Wrong

PDB Code Rms Devn. of Top-Ranked Solution

Rms Devn. of Closest Solution

Average Rms Devn. of All Solutions

Rms Devn. of Worst Solution

Subjective Rating

Page 84: Gold manual

GOLD User Guide 155

1ABE 0.86 0.73 1.12 3.06 Good

1ACO 0.86 0.80 1.49 3.43 Good

1COY 0.86 0.54 3.15 6.63 Good

8GCH 0.86 0.86 5.84 8.54 Good

1LST 0.87 0.47 0.84 1.07 Good

1XID 0.92 0.92 1.95 2.38 Close

2SIM 0.92 0.73 1.20 1.56 Good

1HDY 0.94 0.79 1.30 2.08 Good

3PTB 0.96 0.64 0.91 1.78 Good

1HSL 0.97 0.63 0.81 0.97 Good

2CGR 0.99 0.82 0.98 1.05 Good

1LDM 1.00 1.00 1.00 1.00 Close

1MRK 1.01 0.74 1.45 5.86 Good

1DIE 1.03 0.86 1.94 3.82 Close

6ABP 1.08 0.27 0.99 3.05 Close

1HYT 1.10 1.01 1.11 1.15 Good

1AEC 1.11 0.35 1.42 6.07 Good

4PHV 1.11 1.02 5.74 12.87 Good

3HVT 1.12 1.12 4.25 4.81 Close

1DBB 1.17 0.43 4.86 11.48 Good

2YHX 1.19 1.12 2.99 8.58 Close

6RNT 1.20 0.72 4.16 8.17 Close

1PHA 1.24 0.86 2.88 6.14 Close

1POC 1.27 1.20 2.73 12.37 Good

2DBL 1.31 1.29 8.65 16.31 Close

2PK4 1.34 1.11 1.83 7.01 Close

PDB Code Rms Devn. of Top-Ranked Solution

Rms Devn. of Closest Solution

Average Rms Devn. of All Solutions

Rms Devn. of Worst Solution

Subjective Rating

70 GOLD User Guide

8.2.1 Setting Up a Distance Constraint (see page 70)8.2.2 Method Used for Substructure-Based Distance Constraints (see page 71)8.2.3 Setting Up Substructure-Based Distance Constraints (see page 72)

8.2.1 Setting Up a Distance Constraint

• A distance between a specified ligand and protein atom (or between two ligand atoms) can be constrained to lie between minimum and maximum distance bounds.

• During a GOLD run, if a constrained distance is found to lie outside its bounds, a spring energy term is used to reduce the fitness score, i.e.

E = kx2

where:x is the difference between the distance and the closest constraint bound; k is a user-defined spring constant.

• To constrain a distance, click on the Edit Constraints button to bring up the Constraint Editor. Then, select Distance Constraint from the list of constraint types.

• Specify the required settings using the protein and ligand atom numbers as defined in the MOL2 input files (if PDB input is used, use the sequence number). The maximum and minimum separation of the constrained atoms must be entered (distances are in Å), and the spring constant must also be specified. For example:

Page 85: Gold manual

GOLD User Guide 71

• If the specified ligand atom is topologically equivalent to other atoms in the ligand (e.g. it is one of the oxygen atoms of an ionised carboxylate group), then GOLD will compute the constraint term using whichever of the equivalent atoms gives the best value automatically.

• Click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints (see Section 8.1, page 68).

8.2.2 Method Used for Substructure-Based Distance Constraints

• It is possible to apply a distance constraint to multiple ligands which have a common functional group.

• The constraint forces GOLD to limit the distance between a protein atom and one atom of this functional group. Docking solutions will be biased towards the specified distance range.

• During docking the constraint will be applied to any ligands which contain the specified substructure (matching is performed on the basis of the atom types and 2D connectivity) and the resulting solutions will be biased towards the specified distance range. GOLD always accounts for topology in the substructure.

154 GOLD User Guide

Rms deviations between GOLD predictions and observed ligand positions

PDB Code Rms Devn. of Top-Ranked Solution

Rms Devn. of Closest Solution

Average Rms Devn. of All Solutions

Rms Devn. of Worst Solution

Subjective Rating

1ULB 0.32 0.32 0.38 0.53 Good

2CTC 0.32 0.24 0.38 1.94 Good

1MDR 0.36 0.36 0.50 0.65 Good

2ADA 0.40 0.40 0.47 6.20 Good

1SRJ 0.42 0.42 4.86 1.11 Good

3AAH 0.42 0.36 0.66 0.49 Good

1TPP 0.43 0.37 0.43 0.61 Good

1ASE 0.49 0.36 0.60 1.31 Good

1AHA 0.51 0.51 0.51 0.51 Good

1CBX 0.54 0.49 0.53 0.58 Good

1PBD 0.57 0.18 0.45 0.70 Good

2CHT 0.59 0.57 0.62 0.85 Good

1STP 0.69 0.56 0.67 0.98 Good

1XIE 0.69 0.69 2.20 4.93 Good

1FKI 0.71 0.71 1.81 6.22 Good

1DBJ 0.72 0.39 4.16 6.13 Good

2PHH 0.72 0.63 0.68 0.73 Good

1SLT 0.78 0.78 6.64 8.43 Good

7TIM 0.78 0.64 0.81 1.71 Good

3TPI 0.80 0.36 0.91 1.98 Good

1ACM 0.81 0.79 1.01 1.23 Good

1CPS 0.84 0.60 1.91 6.56 Good

1PHD 0.85 0.32 0.85 2.15 Good

Page 86: Gold manual

GOLD User Guide 153

Appendix C: GOLD Predictions in First Series of Validation Tests

• 3D plots of individual predictions are available on the CCDC web page.

• The tables in this section list:

• Subjective classification of GOLD predictions (see page 153)

• Rms deviations between GOLD predictions and observed ligand positions (see page 154)

• Correlation of subjective classification with rms deviation (see page 158)

• Correlation of prediction quality with number of heavy atoms in ligand (see page 158)

• Correlation of prediction quality with percentage of heavy atoms in ligand that can formhydrogen bonds (see page 158)

• Correlation of prediction quality with number of flexible torsions in ligand (see page 158)

• Correlation of prediction quality with protein resolution (see page 159)

Subjective classification of GOLD predictions

Subjective Result No. PDB Codes

Good 41 1ABE 1ACM 1ACO 1CBX 1COY 1CPS 1DBB 1DBJ 1FKG 1FKI 1HDY 1HEF 1HYT 1LST 1MDR 1MRK 1PBD 1PHD 1POC 1SRJ 1STP 1TPP 1ULB 1XIE 2ADA 2CGR 2CHT 2CTC 2PHH 2SIM 3AAH 3PTB 3TPI 4DFR 4PHV 7TIM 8GCH 1AEC 1AHA 1ASE 1HSL

Close 30 1BLH 1DIE 1DR1 1DWD 1EPB 1GHB 1GLQ 1IDA 1IVE 1LDM 1PHA 1PHG 1RNE 1SLT 1TKA 1TMN 1XID 2DBL 2PK4 2YHX 3CPA 3GCH 3HVT 4CTS 5P2P 6ABP 6RNT 1APT 1AZM 4EST

Some significant errors 9 1BAF 1EAP 1ETR 1HDC 1LIC 1RDS 1ROB 6RSA 1ACK

Wrong 19 1AAQ 1ACJ 1DID 1EED 1ETA 1HRI 1ICN 1IGJ 1MCR 1MUP 2R07 1NIS 1TDB 2AK3 2MTH 2PLV 3CLA 4FAB 2MCP

72 GOLD User Guide

• Note: the substructure must be a sub-graph rather than a complete molecule.

• As with normal distance constraints (see Section 8.2.1, page 70), the score is reduced for unfavourable ligand solutions. The amount of decrease in the score is determined by a weight term that the user must supply.

8.2.3 Setting Up Substructure-Based Distance Constraints

• To use a substructure-based distance constraint, first create a file containing the substructure in MOL2 format (e.g. substructure.mol2). It is recommended that you set atom types manually (see Section 5.3, page 37) since an incomplete fragment can cause problems with automatic atom-typing. The actual conformation of the group in this file is not important, as only the atom types and 2D connectivity will be used.

• Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Substructure Constraint from the list of constraint types.

• Click on the Substructure file name button, then select the substructure file and hit Open.

• Enter the Protein atom number and Substructure atom number to which the distance constraint

Page 87: Gold manual

GOLD User Guide 73

applies (numbering as in the MOL2 files).

• Specify the allowed range of separation by entering a Maximum separation and a Minimum separation (distances are in Å).

• Enter the spring constant (i.e. the weight of the term). This causes a spring-based distance constraint to be added for the specified substructure atom and protein atom. The weight specifies the spring energy term; usually, a weight in the range of 5 to 10 will work well.

• It is possible to define a distance constraint from a centroid of a ring in the ligand. To do this specify an atom within the ring of interest and enable the Use ring center nearest to selected atom in ligand check-box. The closest ring center to the selected atom will be used.Note: when defining a distance constraint involving a ring center ensure that the maximum and minimum separations are adjusted accordingly.

• If the constraint refers to a substructure atom (and therefore a ligand atom) which is topologically equivalent to other atoms (e.g. it is one of the oxygen atoms of an ionised carboxylate group), GOLD will automatically compute the constraint term using whichever of the equivalent atoms gives the best value.

• Click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints (see Section 8.1, page 68).

8.3 Hydrogen Bond Constraints

• Two types of hydrogen bond constraints may be specified:

• A hydrogen bond constraint: H Bond Constraint (see Section 8.3.1, page 73), which can beused to force a hydrogen bond between a particular protein atom and a particular ligand atom.

• A protein hydrogen bond constraint: Protein H Bond Constraint (see Section 8.3.3, page 75),which can be used to specify that a particular protein atom should be hydrogen-bonded to theligand, but without specifying to which ligand atom.

8.3.1 Setting Up Hydrogen Bond Constraints (see page 73)8.3.2 Method Used for Protein H Bond Constraints (see page 74)8.3.3 Setting up Protein H Bond Constraints (see page 75)

8.3.1 Setting Up Hydrogen Bond Constraints

• A ligand atom may be constrained to form a hydrogen bond to a particular protein atom. One atom should be a donatable hydrogen atom (you must give the number of the hydrogen atom, not the O or N atom to which it is attached) and the other should be an acceptor. The protein atom should be available for ligand binding (i.e. solvent accessible). Note: that this constraint does not work with metals.

• The constraint is incorporated into the least-squares fitting routine used by GOLD. Thus, when least-squares fitting is used to dock the ligand (by attempting to form hydrogen bonds encoded within the chromosome) the constraint is added to the least-squares mapping. The constraint has

152 GOLD User Guide

• Note: Certain docking-score terms are the product of a term dependent on the magnitude of a particular physical contribution (e.g. hydrogen bonding) and a scale factor determined e.g. by a regression coefficient.

• The docking-score term descriptors included in the output file can therefore consist of weighted terms, non-weighted terms or both (as specified in the GOLD Output Preferences).

• Weighted terms will be indicated as such in the tag name, e.g. Gold.Chemscore.Hbond.Weighted.

Gold.Goldscore.Inter-nal.Correction

Internal ligand energy offset (see Section 6.9, page 62)

Gold.Chemscore.Zero-Coef

The Chemscore zero coefficient (see Section 6.4.1, page 49)

Gold.Chemscore.Rot Rotatable-bond freezing term contribution to Chem-score value

(see Section 6.4.5, page 56)

Gold.Chemscore.Fitness Total Chemscore fitness value of docked ligand (see Section 6.4.1, page 49)

Gold.Chemscore.Hbond Protein-ligand H-bond contribution to Chemscore value

(see Section 6.4.3, page 52)

Gold.Chemscore.Lipo Protein-ligand lipophilic contribution to the Chem-score value

(see Section 6.4.4, page 54)

Gold.Chemscore.Metal Metal-binding contribution to Chemscore value (see Section 6.4.4, page 54)

Gold.Chem-score.internal_Hbond

Internal ligand intramolecular H-bond contribution to Chemscore value

(see Section 6.4.3, page 52)

Gold.Chemscore.DEClash Protein-ligand clash penalty to the Chemscore value (see Section 6.4.6, page 56)

Gold.Chem-score.DEInternal

Internal ligand torsional strain penalty to the Chem-score value

(see Section 6.4.6, page 56)

Gold.Chemscore.DG Free energy change (that occurs on ligand binding) contribution to Chemscore value

(see Section 6.4.1, page 49)

Gold.Chemscore.Cova-lent

Covalent bonding contribution to Chemscore value (see Section 6.4.7, page 58)

Gold.Chemscore.Con-straint

Constraint contribution to Chemscore value (see Section 6.4.8, page 58)

Gold.Chemscore.CHO-Score

Contribution for weak CH...O H-bonds (see Section 6.7, page 59)

Gold.Chemscore.Inter-nal.Correction

Internal ligand energy offset (see Section 6.9, page 62)

Name Explanation See

Page 88: Gold manual

GOLD User Guide 151

Appendix B: Additional Tags in Output Files

• Solution output files for the docked ligand(s) can contain additional information such as the scoring function terms and the rotated protein hydrogen atom positions that were generated during the docking.

• This information can be written to SD file tags; for MOL2 files, these tags are written to comment blocks. This additional information is particularly important when post-processing docking results with SILVER. It is possible to control the information written to solution files from the Output Preferences window (see Section 14.2, page 111).

• The table below lists the tag names that you are likely to see in GOLD solution files:

Name Explanation See

Gold.Protein.ActiveR-esidues

List of protein residues used to define the binding site. (see Section 3.8.5, page 27)

Gold.Protein.Rota-tedAtoms

Optimised positions of polar protein hydrogen atoms that are generated during docking.

(see Section 14.6, page 115)

Gold.Protein.Rotated-WaterAtoms

Optimised positions of water hydrogen atoms gener-ated during docking

(see Section 3.4, page 16)

Gold.Protein.Rotated-Torsions

Optimised torsions for rotatable bonds in the ligand. Also for protein side chain torsions which have been specified as being allowed to rotate during docking

(see Section 3.6, page 18)

Gold.Id.Protein Enabling the association of a solution with its protein

Gold.Goldscore.Fitness Total GoldScore fitness value of docked ligand (see Section 6.2, page 46)

Gold.Goldscore.Exter-nal.Hbond

Protein-ligand H-bond contribution to GoldScore value

(see Section 6.2, page 46)

Gold.Goldscore.Exter-nal.Vdw

Protein-ligand vdw contribution to GoldScore value (see Section 6.2, page 46)

Gold.Goldscore.Inter-nal.Hbond

Internal ligand intramolecular H-bond contribution to GoldScore value

(see Section 6.2, page 46)

Gold.Goldscore.Inter-nal.Vdw

Internal ligand vdw contribution to GoldScore value (see Section 6.2, page 46)

Gold.Goldscore.Inter-nal.Torsion

Internal ligand torsion-strain contribution to Gold-Score value

(see Section 6.2, page 46)

Gold.Goldscore.Cova-lent.Energy

Covalent bonding contribution to Goldscore value (see Section 6.2, page 46)

Gold.Goldscore.Con-straint.Score

Constraint contribution to GoldScore value (see Section 6.2, page 46)

74 GOLD User Guide

a weight of 5 relative to a normal hydrogen bond taken from the chromosome.

• To specify a hydrogen bond constraint, click on the Edit Constraints button to bring up the Constraint Editor. Then, select H-Bond Constraint from the list of constraint types.

• Specify the ligand and protein atom numbers as defined in the MOL2 input files (if PDB input is used, use the sequence number):

• The hydrogen bond constraint weighting can be altered within the # FITNESS FUNCTION section of the GOLD parameters file by changing the value of the parameter CONSTRAINT_WT.

• Click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints (see Section 8.1, page 68).

8.3.2 Method Used for Protein H Bond Constraints

• A protein hydrogen bond constraint can be used to specify that a particular protein atom should be hydrogen-bonded to the ligand, but without specifying to which ligand atom.

• GOLD will be biased towards finding solutions in which the specified protein atoms form

Page 89: Gold manual

GOLD User Guide 75

hydrogen bonds. The fitness score of a given docking will be penalised by a user specified value c for every protein H-bond constraint that is not satisfied (i.e. for every protein atom that you have specified should form a hydrogen bond but does not).

• GOLD assesses the geometry of each required hydrogen bond on a scale of 0 to 1, with 1 denoting perfect. If this geometry weight for the constrained Hbond falls below the Minimum H-bond geometry weight specified by the user, a penalty will be applied to the score for the unfulfilled hydrogen bond. i.e. it will not be considered to be an H-bond and will therefore contribute a penalty to the fitness score.The magnitude of this penalty is equal to the weight specified for the constraint.

• Each trial ligand docking in a genetic algorithm run is generated by a least-squares fit of mapping points (H-bonding or hydrophobic binding points on the protein with complementary points on the ligand). The inclusion of a protein H-bond constraint will ensure that at least one of the specified protein atoms is included as one of the mapping points. i.e. use of the specified points is enforced at the mapping stage of the algorithm.

• If a ligand simply does not contain sufficient complementary hydrogen-bonding atom(s) to satisfy the specified protein H-bond constraints (e.g. you require an H-bond to a protein acceptor but the ligand contains no donors), then GOLD can be set up not to dock ligands when the specified constraint is physically impossible to satisfy (see Section 8.1, page 68).

8.3.3 Setting up Protein H Bond Constraints

• A protein hydrogen bond constraint can be used to specify that a particular protein atom should be hydrogen-bonded to the ligand, but without specifying to which ligand atom.

• To do this, click on the Edit Constraints button to bring up the Constraint Editor. Then, select Protein H-Bond Constraint from the list of constraint types.

• Specify which protein atoms are to form hydrogen bonds by typing their atom numbers, as defined in the MOL2 input file, into the Protein atom required to form H-bond entry box.Note: Either a donatable hydrogen atom (you must give the number of the hydrogen atom, not the O or N atom to which it is attached) or an acceptor can be specified. The protein atom should be available for ligand binding (e.g. solvent accessible). This constraint does not work with metals.

150 GOLD User Guide

Bond types:

single 1

double 2

triple 3

aromatic ar

amide am

delocalised, e.g. in carboxylate, guanidinium ar

Page 90: Gold manual

GOLD User Guide 149

Appendix A: List of Atom and Bond Types

GOLD uses SYBYL atom and bond types as follows:Atom types:

Hydrogen H

Carbon sp3 C.3

Carbon sp2 C.2

Carbon sp C.1

Carbon aromatic C.ar

Carbocation (guanadinium) C.cat

Nitrogen sp3 N.3

Nitrogen sp2 N.2

Nitrogen sp N.1

Nitrogen aromatic, e.g. in pyridine N.ar

Nitrogen amide N.am

Nitrogen trigonal planar, e.g. in nitro, pyrrole N.pl3

Nitrogen sp3 positively charged, e.g. in lysine N.4

Oxygen sp3 O.3

Oxygen sp2 O.2

Oxygen in carboxylates and phosphates O.co2

Sulphur sp3 S.3

Sulphur sp2 S.2

Sulphoxide sulphur S.o

Sulphone sulphur S.o2

Phosphorus sp3 P.3

Halogens, metals normal element symbols, e.g. F, Cl, Ca, Zn

76 GOLD User Guide

• The Constraint weight is the strength of bias applied to the formation of a specified hydrogen bond in the least squares mapping algorithm within GOLD. The Constraint weight is also the value of the penalty applied to the fitness score for each constrained H bond that is not formed.

• The Minimum H bond geometry weight is a user defined score that determines how good a hydrogen bonding interaction has to be in order for it to be considered a hydrogen bond by GOLD. The Minimum H bond geometry weight takes a range of values from 0 to 1, by default this value is set at 0.005.

• For a given protein H bond constraint more than one protein atom number can be entered in the Protein atom entry box. This will instruct GOLD to use an either-or type of constraint during docking. For example, specifying two protein atoms, acceptor m and acceptor n, separated by a space, will result in the constraint being satisfied if an H bond is formed to either m or n during docking. This is of use when defining constraints involving, for example, carboxylates where it is not important which oxygen atom forms an H bond, provided that one of them does.

• Click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible

Page 91: Gold manual

GOLD User Guide 77

to specify several different protein H bond constraints, with different weights for each constraint.

8.4 Region (Hydrophobic) Constraints

• This constraint can be used to bias the docking towards solutions in which particular regions of the binding site are occupied by specific ligand atoms (or types of ligand atom, e.g. hydrophobic atoms).

8.4.1 Method Used for Region (Hydrophobic) Constraints (see page 77)8.4.2 Setting Up Region (Hydrophobic) Constraints (see page 77)

8.4.1 Method Used for Region (Hydrophobic) Constraints

• This constraint can be used to bias the docking towards solutions in which particular regions of the binding site are occupied by specific ligand atoms (or types of ligand atom).

• For each region (hydrophobic) constraint specified a sphere is placed at an explicitly-defined position (using x,y,z coordinates) within the binding site. Each sphere is assigned a user-defined radius, so a sphere can be adjusted if required, e.g, to fill an entire pocket in the binding-site. Minimum settable radius as 0.5 Å.

• A contribution (determined according to a user-specified weighting) is then added to the score for each specified non-hydrogen ligand atom that lies within the designated sphere. Note: A contribution is added to the score for each atom located within the sphere, (i.e. the total contribution will depend on the number of atoms found in the region of interest and ultimately the ligand-accessible volume of the region).

• The ligand atoms used in the constraint can be specified explicitly from a list of atom numbers (as defined in the MOL2 input file). Alternatively, it is possible to use all hydrophobic ligand atoms, or to use only those hydrophobic atoms in aromatic rings. Atoms considered to be hydrophobic include:

• Carbon atoms bound to at least two H or C atoms.

• Atoms typed C.cat.

• Atoms typed S.3 and bound to two carbons.

• H atoms bound to an sp2, sp3 or aromatic carbon (Note: only heavy atoms found within thesphere will contribute to the score).

• Details of the region (hydrophobic) constraint calculation, including the final contribution to the fitness score, are given in the ligand log file (see Section 14.10, page 118).

8.4.2 Setting Up Region (Hydrophobic) Constraints

• Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Region (Hydrophobic) Constraint from the list of constraint types.

148 GOLD User Guide

20. Acknowledgments• GOLD was written by Gareth Jones (University of Sheffield, UK) in a DTI LINK collaboration

with GlaxoWellcome and the Cambridge Crystallographic Data Centre (CCDC).

• Funding was provided by the Biotechnology and Biological Sciences Research Council, the Department of Trade and Industry, the Medical Research Council, GlaxoWellcome Ltd and CCDC.

• Peter Willett (University of Sheffield), Robert Glen (Wellcome), Andrew Leach (GlaxoWellcome) and Jacques Barbanton (Lipha Pharmaceuticals) are also thanked for significant contributions to the development of GOLD.

• ChemScore in GOLD was implemented by Astex Technology, Cambridge, UK.

• CCDC staff involved in GOLD are Jason Cole, Simon Bowden and Robin Taylor.

• One of the torsion libraries supplied with GOLD was developed by Gerhard Klebe and Thomas Mietzner (BASF).

Page 92: Gold manual

GOLD User Guide 147

19. References

• Molecular Recognition of Receptor Sites Using a Genetic Algorithm with a Description of DesolvationG. Jones, P. Willett and R. C. GlenJ. Mol. Biol., 245, 43-53, 1995

• Development and Validation of a Genetic Algorithm for Flexible DockingG. Jones, P. Willett, R. C. Glen, A. R. Leach and R. Taylor, J. Mol. Biol., 267, 727-748, 1997

• A New Test Set for Validating Predictions of Protein-Ligand InteractionsJ. W. M. Nissink, C. Murray, M. Hartshorn, M. L. Verdonk, J. C. Cole and R. TaylorProteins, 49(4), 457-471, 2002

• Life-science Applications of the Cambridge Structural DatabaseR.TaylorActa Cryst., D58, 879-888, 2002

• Improved Protein-Ligand Docking using GOLDM. L. Verdonk, J. C. Cole, M. J. Hartshorn, C. W. Murray, R. D. Taylor Proteins, 52, 609-623, 2003

• Virtual Screening Using Protein-Ligand Docking: Avoiding Artificial EnrichmentMarcel L. Verdonk, Valerio Berdini, Michael J. Hartshorn, Wijnand T. M. Mooij, Christopher W. Murray, Richard D. Taylor, and Paul Watson,J. Chem. Inf. Comput. Sci., 44, 793-806, 2004

• Protein-Ligand Docking and Virtual Screening with GOLDJ. C. Cole, J. W. M. Nissink, R. Taylor in Virtual Screening in Drug Discovery (Eds. B. Shoichet, J. Alvarez), Taylor & Francis CRC Press, Boca Raton, Florida, USA (2005).

• Modeling Water Molecules in Protein-Ligand Docking Using GOLDMarcel L. Verdonk, Gianni Chessari, Jason C. Cole, Michael J. Hartshorn, Christopher W. Murray, J. Willem M. Nissink, Richard D. Taylor, and Robin Taylor,J. Med. Chem., 48, 6504-6515, 2005

• Comparing protein-ligand docking programs is difficultJason C. Cole, Christopher W. Murray, J. Willem M. Nissink, Richard D. Taylor, Robin TaylorProteins, 60, 325-332, 2005

78 GOLD User Guide

• Specify the ligand atoms to be used in the constraint by selected either All hydrophobic ligand atoms, Hydrophobic ligand atoms in aromatic rings, or User-specified list. If User-specified list is selected then enter the ligand atom numbers (as defined in the MOL2 input file) into the Ligand atoms entry box. Atom numbers should be separated by spaces.

• Specify the position of the centre of the sphere (defined using x,y,z coordinates), and the radius of the sphere (distances are in Å).

• A score contribution must also be specified. This is the value that will be added to the fitness score for each specified non-hydrogen ligand atom found within the sphere region.Note: the total contribution added will therefore depend on the number of atoms located within the sphere.

• Click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible to define multiple region (hydrophobic) constraints.

Page 93: Gold manual

GOLD User Guide 79

8.5 Template Similarity Constraints

• This constraint can be used to bias the conformation of docked ligands towards a given solution, or template.

8.5.1 Method Used for Template Similarity Constraints (see page 79) 8.5.2 Setting Up a Template Similarity Constraint (see page 79)

8.5.1 Method Used for Template Similarity Constraints

• This constraint will bias the conformation of docked ligands towards a given solution. This solution, or template, can, for example, be another ligand in a known conformation, a common core (useful when docking ligands of a combinatorial set), or it may just be a large substructure that is expected, or known, to bind in a certain way.

• The template must be supplied as a MOL2 file or PDB file.

• Unlike the distance-based constraints, which reduce the score for ligands that adopt unfavourable orientations, this constraint will add an energy term to the score based on the similarity between the ligand being docked and the template provided. The similarity between the two is evaluated as a Gaussian overlap term.

• The similarity constraint can be applied in three ways that differ in the way that the overlap between ligand and template is calculated. The similarity can be evaluated:

• by using the overlap between all donor atoms in the template and the ligand being docked.

• by using the overlap between all acceptor atoms in the template and the ligand being docked.

• by using the overlap of all atoms of the template (this can be regarded as a ligand-shapeconstraint).

• The energy term to be added is calculated as similarity times weight (the similarity value is between 0 and 1, where 1 indicates identity of template and ligand).

• Note: If you wish to place a fragment at an exact specified position in the binding site, as opposed to biasing the docking, use the scaffold match constraint (see Section 8.6, page 80).

8.5.2 Setting Up a Template Similarity Constraint

• Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Template Similarity Constraint from the list of constraint types.

• Fill in the form to specify the similarity type to be used [H-bond donor overlap, H-bond-acceptor overlap, or shape overlap (see Section 8.5.1, page 79)]; the similarity template file; and the weight of the constraint.

146 GOLD User Guide

• identify_ligand.py can be invoked from the command line. The structure of the command is:

identify_ligand.py <ligand data file> <ligand number>

Note: identify_ligand.py is a Python script and as such requires a working installation of Python (http://www.python.org).

Page 94: Gold manual

GOLD User Guide 145

• For example, the table of rms deviations below for nine dockings of a molecule produces the following clustering with the complete linkage method:

18.4 identify_ligand.py

• identify_ligand.py can be used to extract a specific ligand description from PDB SDFile or MOL2 format input files.

• It requires a filename and a ligand number (n) as arguments and then locates the nth ligand in the file. If any descriptive information, such as the ligand name, is available for that ligand, it is then displayed.

2 3 4 5 6 7 8 9

1 0.8 1.1 1.0 1.0 1.4 2.3 5.0 4.6

2 0.9 1.1 1.1 1.2 2.3 5.2 4.6

3 0.4 0.8 0.9 2.3 5.0 4.5

4 0.6 1.1 2.3 4.9 4.5

5 1.3 2.0 4.9 4.5

6 1.8 5.1 4.4

7 5.3 4.5

8 2.4

Step Distance between clusters being merged

Clusters

1 0.40 1 | 2 | 3, 4 | 9 | 5 | 6 | 7 | 8 |

2 0.84 1 | 2 | 3, 4, 5 | 9 | 8 | 6 |

3 0.84 1, 2 | 7 | 3, 4, 5 | 9 | 8 | 6 |

4 1.13 1, 2, 3, 4, 5 | 7 | 6 | 9 | 8 |

5 1.42 1, 2, 3, 4, 5, 6 | 7 | 8 | 9 |

6 2.35 1, 2, 3, 4, 5, 6, 7 | 9 | 8 |

7 2.38 1, 2, 3, 4, 5, 6, 7 | 8, 9|

8 5.28 1, 2, 3, 4, 5, 6, 7, 8, 9 |

80 GOLD User Guide

• The similarity template file should contain the template molecule or fragment in its docked position (i.e. expressed with respect to the same coordinate frame as the protein and with the coordinates required to place it in the correct pose).

• The weight term determines the maximum energy term that would be added to the score in the case of perfect overlap between ligand and template. As an initial value for this term, we suggest a value between 5 and 30.

• Click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible to define multiple constraints, e.g. one for donors and one for acceptors.

8.6 Scaffold Match Constraint

• The scaffold match constraint can be used to place a fragment at an exact specified position in the binding site, the geometry of the fragment will not be altered during docking.

Page 95: Gold manual

GOLD User Guide 81

8.6.1 Method Used for Scaffold Match Constraint (see page 81)8.6.2 Setting Up Scaffold Match Constraints (see page 81)

8.6.1 Method Used for Scaffold Match Constraint

• This constraint will attempt to a place a ligand onto a given scaffold location. The scaffold, can, for example, be a common core, or fragment (useful when docking ligands of a combinatorial set), or it may just be a substructure known to adopt a certain binding position.

• The scaffold must be supplied as a MOL2. The file should contain the scaffold fragment in its docked position (i.e. expressed in the same coordinate frame as the protein and with the coordinates required to place it in the correct pose). Note: It is important that the Sybyl atom and bond types in the scaffold mol2 file match those in the scaffold portion of the ligand. The scaffold matching algorithm matches heavy atoms only. However it is recommended that the scaffold have hydrogens correctly placed on all appropriate atoms other than the unfulfilled valency at the substitution point, which must not be blocked by hydrogen.

• Unlike the template similarity constraint, which will bias the docking by adding an energy term to the score based on the similarity between the ligand being docked and the template provided, this constraint is enforced at the mapping stage in GOLD. Ligand placements are generated using a best least-squares fit with the scaffold heavy atom positions. i.e. this constraint forces all atoms on the matching portion of the ligand to lie very close, or coincident, with the corresponding scaffold. There is no S(con) contribution to the fitness score to bias dockings.

• How closely ligand atoms fit onto the scaffold is governed by a user specified weight. Setting a higher weight will force the ligand to be placed onto the scaffold locations more strictly. A default weight of 5.0 is used. Note: setting high weightings can have a detrimental effect on the fitness score if the placement results in e.g. bad protein-ligand clashes. If desired, values below 1 can be used to achieve a more lenient overlay.

• Symmetry effects (such as the flipping of a phenyl ring by 180 degrees) are not taken into account during matching of the ligand onto the scaffold. Therefore, a scaffold that will give a unique match should ideally be provided.

• For a given ligand, it is not possible to match multiple scaffolds at the same time. Scaffolds are evaluated in the order supplied by the user and the scaffold that matches the ligand first will be used. This means that it is possible to specify two or more different scaffolds, and GOLD will use the scaffold that matches the ligand first. This can be useful when docking multiple different series of compounds.

8.6.2 Setting Up Scaffold Match Constraints

• Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Scaffold Match Constraint from the list of constraint types.

144 GOLD User Guide

“C:\Program Files\CCDC\GOLD\gold\d_win32\bin\smartrms_win32.exe” [-hv]conformation_1 conformation_2

• The flags are:

h use heavy atoms only (the calculation easily becomes intractable if Hs are included). v verbose output.

conformation_1 and conformation_2 are MOL2 files containing the two conformations.

18.3 rms_analysis

• rms_analysis calculates an rms difference matrix for a set of structures (as MOL2 files) and performs hierarchical cluster analysis. A graph isomorphism algorithm is used to determine optimal rms values.

• rms_analysis can be invoked from the command line.

• The structure of the command is dependent on the platform being used:

• UNIX:

$GOLD_DIR/utilities/rms_analysis -method [simple|complete|group_average] <file1>.mol2<file2>.mol2 <file3>.mol2 <file4>.mol2...

Note: this command will only work if users have their GOLD_DIR environment variablecorrectly set. To e.g. carry out a simple cluster analysis for the files file1.mol2 and file2.mol2,the following command would be used:

$GOLD_DIR/utilities/rms_analysis -method simple file1.mol2 file2.mol2

• Windows (via the command prompt):

<install_dir>\gold\d_win32\bin\rms_analysis_win32.exe -method[simple|complete|group_average] <file1>.mol2 <file2>.mol2 <file3>.mol2 <file4>.mol2...

where <install_dir> is the GOLD installation directory. If specifying the full path, thecommand will need to be in inverted commas, e.g. :

“C:\Program Files\CCDC\GOLD\gold\d_win32\bin\rms_analysis_win32.exe -method[simple|complete|group_average] <file1>.mol2 <file2>.mol2 <file3>.mol2 <file4>.mol2...

• Choose simple for single linkage cluster analysis, complete for complete linkage, group_average for group average.

Page 96: Gold manual

GOLD User Guide 143

18.2 smart_rms

• smart_rms calculates the rms difference between two conformations of the same structure, while taking account of symmetry effects (such as the flipping of a phenyl ring by 180 degrees). Using a graph isomorphism algorithm, an rms score is calculated for each way of mapping the molecule onto itself.

• smart_rms can be invoked from the command line. The following platform-dependent commands should be used.

• UNIX platforms:

$GOLD_DIR/untilities/smart_rms [-hv] conformation_1 conformation_2

• Windows platforms (at the Windows command prompt):

<install_dir>\gold\d_win32\bin\smartrms_win32.exe [-hv] conformation_1 conformation_2

where <install_dir> is the GOLD installation directory. If specifying the full path, thecommand will need to be in inverted commas, e.g. :

82 GOLD User Guide

• The scaffold structure file should contain the scaffold molecule or fragment in its docked position (i.e. within the same coordinate frame as the protein).

• The Scaffold Match Constraint Weight determines how closely ligand atoms fit onto the scaffold. Setting a higher weight will force the ligand to be placed onto the scaffold locations more strictly.

• By default, all heavy atoms in the supplied scaffold structure file will be used for matching. However, it is possible to specify only a subset of those atoms in the scaffold structure (these may include non-heavy atoms). Atoms should be specified using the atom indices as defined in the scaffold structure file (indices should be separated by a single space). Limiting the number of atoms to be matched can be useful for large, rigid scaffolds. In such a case, specifying only a few atoms distributed throughout the scaffold can be sufficient to obtain a good 3D superimposition.

• Click on the Add constraint or Update selected constraint button to add the constraint definition to the Current Constraints (see Section 8.1, page 68).

Page 97: Gold manual

GOLD User Guide 83

9. Torsion Angle Distributions

9.1 Basic Use of Torsion Angle Distributions (see page 83)9.2 Choice of Torsion Angle Distribution Files (see page 83)9.3 Editing Torsion Angle Distribution Files (see page 84)9.4 Matching Torsion Angle Distributions at Run Time (see page 88)

9.1 Basic Use of Torsion Angle Distributions

• Torsion angle distributions extracted from the Cambridge Structural Database (CSD) can be input to GOLD. These distributions are used to restrict the ligand conformational space sampled by the genetic algorithm.

• Using torsion angle distributions in this way will not make GOLD go any faster. However, it may improve the chances of GOLD finding the correct answer by biasing the search towards ligand torsion-angle values that are commonly observed in crystal structures. It may also improve convergence and so make GOLD usable with faster settings (see Section 11.3, page 94).

• To enable the use of torsion angle distributions click on the Fitness & Search Options button in the Fitness Function and Search Settings panel in the GOLD front end, then in the resulting window switching on the check box labelled Use torsion angle distributions from the CSD.

9.2 Choice of Torsion Angle Distribution Files

• Three torsion angle distribution files are provided:

• gold.tordist - this is the default file.

• gold.tordist.new - this contains all the torsions in gold.tordist and many more newdistributions. However, many of these newer torsions have very few hits in the CSD and nosignificant improvement was found when using this new file in GOLD.

• mimumba.tordist - this contains all the torsional distributions used in the MIMUMBAprogram (Klebe and Mietzner, J.Comput.-Aided Mol.Des., 8, 583-606, 1994).

• Click on the Distributions File button in the GOLD front end to pick a torsion angle distribution file. Alternatively, type the required file into the entry box.

• It is possible to customise torsion angle distribution information by editing one of the standard torsion angle distribution files (see Section 9.3, page 84).

142 GOLD User Guide

18. Utility Programs• A number of utility programs are supplied to assist in the analysis of GOLD docking results

• The following utility is available in the sgi_utils directory of the GOLD distribution:

• 18.1 grommitt (see page 142) - used for simple visualisation of dockings, available for SGIusers running IRIX only.

• The following utilities are available in the utilities directory of the GOLD distribution:

• 18.2 smart_rms (see page 143) - computes rms deviations between two conformations ofthe same structure.

• 18.3 rms_analysis (see page 144) - performs cluster analysis on a set of docking solutions.

• 18.4 identify_ligand.py (see page 145) - extracts descriptive information such as ligandname for a specified structure record in a file.

18.1 grommitt

• grommitt is a simple molecular viewer for examining binding modes and available for SGI users running IRIX only.

• When GOLD is being run interactively, grommitt can be used to display the current top solution from a genetic algorithm run. To do this, click on the Display/Output Options button in the GOLD front end (see Section 2.2, page 4).

• grommitt can also be opened from the command line, e.g. to display overlays of SYBYL MOL2 files. The structure of the command is:

grommitt [-chp] <files>

The flags are:

c each molecule is coloured differently. Normally, molecules are coloured by atom type. h only display heavy atoms. p pretty (but slow) display.

<files> is a list of SYBYL MOL2 and/or PDB files.

• grommitt is useful for visualising a set of GOLD solutions, e.g. to see at a glance if all solutions are identical or whether there are several different binding modes. For example:

%grommitt -h gold_soln*

displays the window:

Page 98: Gold manual

GOLD User Guide 141

• Non-parametric tests indicate that GOLD score and activity are not significantly correlated (Spearman rs = -0.564, p = 0.056; Kendall =-0.382, p = 0.086).

• There is not a statistically significant relationship between the GOLD score and activity. It is worth noting that the compounds are all structurally similar and all are active.

17. Context-Dependent Help• Context-dependent help is available in the front end, by clicking the middle mouse button on the

item for which information is required. For example, clicking on:

• brings up this help window:

84 GOLD User Guide

9.3 Editing Torsion Angle Distribution Files

• To edit the torsion angle distribution file click on the Edit Distributions button in the Fitness Function and Search options window (accessible by clicking on the Fitness & Search Options button in the Fitness Function and Search Settings panel in the GOLD front end).

• If you are using the default torsion angle distribution file, it will be copied to the current directory.

• The format of entries in the torsion angle distribution file is quite strict: incorrect editing of the file may cause GOLD to behave in unexpected ways or even to crash.

9.3.1 Format of Torsion Angle Distribution File Header (see page 84)9.3.2 Format of Torsion Angle Distributions (see page 85).9.3.3 Example Torsion Angle Distributions (see page 87).9.3.4 Extracting Torsion Angle Distributions from the Cambridge Structural Database (see page 88)

9.3.1 Format of Torsion Angle Distribution File Header

• The first section of the torsion angle distribution file sets parameters and tells GOLD what to do with the distributions.

• N_BINS is the number of bins used in the torsion histogram.

• REMOVE_HIGH_ENERGY and DELTA_E are parameters that can be used to control the filtering out of high-energy torsion angles.

• If torsion angle distributions are used, GOLD will no longer sample over 360 degrees but will constrain the torsion to values contained in the histogram. However, if a histogram contains a large number of entries, there may be some high-energy torsions within the histogram. GOLD therefore provides a method for filtering out such high-energy torsions: set REMOVE_HIGH_ENERGY = 1 and DELTA_E = E to remove those bars in the histogram that correspond to torsions that are E kcal/mol higher in energy than the most populated state. The ground state of the torsion is assumed to correspond to the maximum peak in the torsional histogram. The energy difference between this ground state and any other peak in the torsion angle histogram is then assumed to be approximately given by the partition function.

• The following table indicates the relationship between the value of DELTA_E and the ratio high/low, where high is the height of the biggest bar in the histogram and low is the height below which bars will be removed from the histogram:

Page 99: Gold manual

GOLD User Guide 85

• For example, if REMOVE_HIGH_ENERGY=1 and DELTA_E = 2.5, those bars which are 1/69th or less of the height of the largest bar will be removed from the histogram and torsion angles corresponding to these bars will never be sampled by the genetic algorithm.

• The relationship between DELTA_E and ratio, based on the partition function, is:

ratio = exp (DELTA_E/0.5898)

9.3.2 Format of Torsion Angle Distributions

• Each torsion angle distribution entry comprises three lines: the first line is the name of the torsion angle; the second line is the definition of the torsion angle; the third line is the histogram.

• The histogram should be a list of space-separated integers. The ith integer should be the number of observations in the torsion-angle range of the ith bin. There should be N_BINS integers in all. The first bin starts at -180 degrees and the last bin ends at +180.

• Torsion angle distributions are defined using Backus-Naur Form (BNF) grammar, as follows (all the symbols in the table are part of the grammar except for ||, which is used to indicate alternative fields):

DELTA_E ratio

3.0 161

2.5 69

2.0 30

TORSION NODE | NODE | NODE | NODE | || NODE | NODE | NODE | NODE | DIRECTIVE ||NODE | NODE | NODE | NODE | DIRECTIVE | DIRECTIVE

DIRECTIVE expand <min> <max> || period <min> <max>

NODE ATOM || ATOM (NEIGHBOURS)

NEIGHBOURS NEIGHBOUR_NODE || NEIGHBOUR_NODE NEIGHBOURS

NEIGHBOUR_NODE NODE || HYDROGENS

HYDROGENS 0H || 1H || 2H || 3H

ATOM ATOM_DEF || ATOM_DEF [FRAGMENT]

FRAGMENT ribose || adenine || uracil || benzene

ATOM_DEF TYPE_DEF || LINKAGE&ltno space&gtTYPE_DEF

140 GOLD User Guide

• Non-parametric tests indicate that GOLD score and activity are significantly correlated according to the Kendall test but not according to the Spearman test (Spearman rs = -0.191, p =

0.065; Kendall =-0.150, p = 0.033).

• These inhibitors are all extremely hydrophobic, representing a difficult case for GOLD.Note: For this dataset and target GOLD is not predicting active molecules as inactive. This is advantageous in virtual screening applications (inactives that are predicted as actives are acceptable in this context, the converse is not applicable).

16.2.3 Prediction of Binding Affinity to FKBP12

• GOLD was used to dock a set of 13 FK506BP inhibitors (data from Holt et a., J. Am. Chem. Soc, 1993, 115, 9925). 20 docking runs were performed on each complex and the best fitness score recorded.

• A plot of fitness score against measured Ki is shown below:

Page 100: Gold manual

GOLD User Guide 139

• The GOLD scores are a good indicator of activity for this series. It is most unlikely that this level

of prediction could have arisen through chance ( 2 = 15.27, p < 0.001, 1 degree of freedom)

16.2.2 Prediction of Binding Affinity to Alpha Chymotrypsin

• GOLD was used to dock a set of 94 alpha-chymotrypsin inhibitors (data from Stewart et al., T. C. Methods, 1990, 3, 713).

• A plot of fitness score against measured Ki is shown below:

• The graph below omits the two outliers:

Predicted active Predicted inactive

Observed active 14 1

Observed inactive 5 14

86 GOLD User Guide

• This grammar allows torsions to be specified as four fragment nodes. Each node defines an atom type and, optionally, a set of neighbours to which the atom is connected. Each of the neighbours is a node or an exact count of the number of hydrogen atoms to which the atom is bonded. Atom types are defined using SYBYL atom types or elemental atom types. The atom can also be required to be part of a pre-defined fragment.

• Bonding environments can also be specified, using the symbols ~,=,-, which indicate, respectively, that an atom forms an aromatic, double or single bond to its parent node. Note: ~,=, and - should therefore not be used on the first atoms specified, these bond types are specified for substituents only.

• A node is a parent of all its neighbours and a top level node in the torsion definition is a parent of subsequent nodes in the torsion.

• There are currently four fragments available, one of which (the uracil fragment) matches both thymine and uracil. More fragments can easily be added. The Ullman algorithm is used to determine if an atom belongs to a fragment. Fragments are defined through SYBYL atom types and connectivity (exact bond types are not used). Only heavy atoms are considered. Currently, fragments are precompiled, but they could be read in at run-time if required.

• Directives are allowed to take account of special circumstances. There are two directives: expand and period.

• The expand directive has the form expand <min> <max> where <max> - <min> = 180.0 or <min> = 0. This directive is used for torsions where the CSD query has symmetry and torsions are only measured over <min> to <max> degrees. However, although the CSD query may have two-fold symmetry, often the matched structure does not. The expand directive fills out the rest of the histogram with the correct values.

• The period directive takes account of those torsional distributions for which the matched structure has symmetry. This directive has the form period <pmin> <pmax>. The distribution will only be expanded between angles <pmin> and <pmax>.

TYPE_DEF SYB_TYPE || EL_TYPE

LINKAGE ~ || = || -

SYB_TYPE C.3 || C.2 || C.1 || C.ar || C.cat || N.3 || N.2 || N.1 || N.ar || N.am || N.pl3 || N.4 || O.3 || O.2 || O.co2 ||S.3 || S.2 || S.o || S.o2 || P.3 || H || F || Cl || Br || I

EL_TYPE C || N || O || S || P

TORSION NODE | NODE | NODE | NODE | || NODE | NODE | NODE | NODE | DIRECTIVE ||NODE | NODE | NODE | NODE | DIRECTIVE | DIRECTIVE

Page 101: Gold manual

GOLD User Guide 87

9.3.3 Example Torsion Angle Distributions

Here are some examples of torsion angle distributions extracted from the Cambridge Structural Database and in the correct format:

DIAGRAM

acid T1C.2 (O.co2 O.co2) | C.3 (2H) | C.3 (2H) | C 41 8 0 0 0 0 0 0 0 1 8 7 2 0 0 0 0 1 1 0 0 0 1 0 4 1 0 1 0 0 0 0 0 2 2 41

DIAGRAM

acid T2O.co2 | C.2 (O.co2) | C.3 (2H) | C.3 (2H C) 8 5 1 3 2 1 3 2 3 2 3 3 4 0 3 2 7 11 15 9 1 4 1 0 2 1 4 4 1 3 3 6 0 3 5 7

DIAGRAM

amide nh T2 C.2 (=O.2 N.am (1H)) | C.3 (1H C.3) | N.am (1H) | C.2 (=O.2) 1 1 14 16 29 25 23 38 35 50 82 156 53 6 1 0 0 0 0 0 0 1 1 14 17 15 4 4 2 1 2 5 2 2 0 0

DIAGRAM

uracilO.3 [ribose] | C.3 [ribose] | N.am [uracil] (C.2 (1H))| C.2 [uracil] (=O.2) 24 73 85 44 59 60 40 14 8 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 7 5 3 0 0 1 4 3 3 5 10 6

DIAGRAM

benzyl sub C | C.3 (2H) | C.ar (~C.ar (0H)) | ~C.ar (0H) | expand 0.0 180.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 9 27 76 64 15 7 4 2 0 0 0 0

138 GOLD User Guide

activity. This has varied from a clear relationship for a test set of neuraminidase inhibitors, a discernable relationship for alpha-chymotrypsin inhibitors, but no statistically significant relationship for FK506 inhibitors.

16.2.1 Prediction of Binding Affinity to Influenza A Neuraminidase (see page 138)16.2.2 Prediction of Binding Affinity to Alpha Chymotrypsin (see page 139)16.2.3 Prediction of Binding Affinity to FKBP12 (see page 140)

16.2.1 Prediction of Binding Affinity to Influenza A Neuraminidase

• GOLD was used to dock a set of 34 neuraminidase inhibitors. 25 docking runs were performed on each complex and the best fitness score recorded.

• A plot of fitness score against measured IC50 (data supplied by GlaxoWellcome) is shown

below:

• There are no compounds with low fitness and high activity and there is evidence of a correlation (Spearman rs = -0.649, p < 0.001; Kendall =-0.483, p < 0.001).

• Considering 10 m to be a cutoff for activity, there are 15 actives and 19 inactives. Using a GOLD score of 74 or above as a predictor of activity gives:

Page 102: Gold manual

GOLD User Guide 137

• Classified in the validation experiments as a prediction that was wrong (1ICN - oleate docked into a fatty-acid binding protein):

16.2 Correlation between Fitness Function and Biological Activity

• The GOLD fitness function was designed to discriminate between different binding modes of the same molecule. Extra terms are probably required to compare different molecules. For example, a term is probably required to account for the entropic loss associated with freezing rotatable bonds when the ligand binds.

• Nevertheless, some correlation has been observed between GOLD fitness scores and biological

88 GOLD User Guide

9.3.4 Extracting Torsion Angle Distributions from the Cambridge Structural Database

• The command process_tab (only available on SG machines) will extract the torsion angle histogram from the .tab file produced by a search of the Cambridge Structural Database, and reformat it so that it can be added into the GOLD torsional distribution file.

9.4 Matching Torsion Angle Distributions at Run Time

• GOLD identifies each rotatable bond in the ligand and attempts to match it to a torsion angle distribution in the torsion angle distribution file. This includes bonds that are identified by GOLD as flippable (e.g., if torsions are switched on then ligand carboxylic acids (O)C-OH will also use a torsion distribution).

• In some cases, a rotatable bond may match more than one torsion angle distribution. If this happens, a score is calculated for each torsion angle distribution and the distribution with the highest score is selected. Note: a weighting scheme is used when matching rotatable bonds in the ligand to a torsion angle distribution such that more specific torsion definitions are taken in preference to more generic ones.

• Each portion of the torsion angle distribution contributes to the score as follows:

Element atom type 1.5

SYBYL atom type 2.0

Fragment 3.0

Hydrogen count 2.0

Bond linkage 0.5

Page 103: Gold manual

GOLD User Guide 89

10. Genetic Algorithm Parameter Definitions

10.1 Genetic Algorithm Overview (see page 89)10.2 Population Size (see page 89)10.3 Selection Pressure (see page 90)10.4 Number of Operations (see page 90)10.5 Number of Islands (see page 90)10.6 Niche Size (see page 91)10.7 Operator Weights: Migrate, Mutate, Crossover (see page 91)10.8 Van der Waals and Hydrogen Bonding Annealing Parameters (see page 91)10.9 Hydrophobic Fitting Points (see page 92)

10.1 Genetic Algorithm Overview

• GOLD optimises the fitness score by using a genetic algorithm.

• A population of potential solutions (i.e. possible docked orientations of the ligand) is set up at random. Each member of the population is encoded as a chromosome, which contains information about the mapping of ligand H-bond atoms onto (complementary) protein H-bond atoms, mapping of hydrophobic points on the ligand onto protein hydrophobic points, and the conformation around flexible ligand bonds and protein OH groups.

• Each chromosome is assigned a fitness score based on its predicted binding affinity and the chromosomes within the population are ranked according to fitness.

• The population of chromosomes is iteratively optimised. At each step, a point mutation may occur in a chromosome, or two chromosomes may mate to give a child. The selection of parent chromosomes is biased towards fitter members of the population, i.e. chromosomes corresponding to ligand dockings with good fitness scores.

• A number of parameters control the precise operation of the genetic algorithm, viz.

• Population Size (see page 89)

• Selection Pressure (see page 90)

• Number of Operations (see page 90)

• Number of Islands (see page 90)

• Niche Size (see page 91)

• Operator Weights: Migrate, Mutate, Crossover (see page 91)

• Van der Waals and Hydrogen Bonding Annealing Parameters (see page 91)

• Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).

10.2 Population Size

• The genetic algorithm maintains a set of possible solutions to the problem. Each possible solution is known as a chromosome and the set of solutions is termed a population.

136 GOLD User Guide

16.1.4 Examples of GOLD Dockings

• The plots below show examples of GOLD dockings:

• Classified in the validation experiments as a good prediction (4PHV - a peptide-like ligand docked into HIV protease):

• Classified in the validation experiments as a close prediction (1GLQ - a nitrophenyl-substituted peptide ligand docked into glutathione-S-transferase):

• Classified in the validation experiments as a prediction with significant errors (1EAP - a succinylaminophosphonate ligand docked into an antibody):

Page 104: Gold manual

GOLD User Guide 135

• The aspartic protease set contains a high proportion of large ligands with several rotational bonds; these complexes are difficult samples for docking. The lyases are difficult to dock as the set features relatively shallow binding sites and polar ligands that are partly solvent-exposed (examples are 1aco and 2h4n); crystal waters sometimes mediate binding (examples are 1pdz, 1okm).

• However, it is extremely difficult to draw conclusions from data obtained using such small sets. When GOLD solutions are classified as good or wrong using an RMS threshold of 2.0Å, a simple chi-squared based test can be used to decide whether or not the observed result really is different from the success rate obtained for the clean list.

• It does show that the set of aspartic proteases can be regarded as different at a confidence level of P<0.025. The lyase and lectin sets have significantly different results when P=0.10 is allowed, and for the isomerases P<0.25 applies. The results for all other sets may just differ by chance, and are not significantly different from the results obtained for the clean list.

• Alternatively, F statistics can be used to decide whether a subset is really different from the clean list in terms of RMS value. In this case, the F ratio is calculated using the null hypothesis that the average RMS for the clean list of 224 entries and each sublist is equal.

• Results for F indicate that only the subsets containing aspartic proteases and isomerases (and possibly the lectin set) are significantly different from the clean list, showing clearly that it is very difficult to draw any meaningful conclusions from the results for such small sets.

Influence of Mediating Water Molecules on GOLD Results

• Waters have been removed from all complexes prior to docking. This probably lowers performance of the docking algorithm, as waters can mediate interactions that are essential for ligand-binding. To estimate this effect, a subset of structures were identified with at least one strongly-bound water molecule within a 2.9Å distance of both protein and ligand moieties.

• GOLD success rates for this subset (40 entries) and structures lacking mediating water molecules (55 entries) are reported below. All entries are subsets of the clean list. There seems to be a trend towards lower success rates for structures that contain water-mediated contacts between ligand and protein, although the impact of leaving water molecules out is not so high as might be expected.

• GOLD results for complexes with and without waters that mediate protein-ligand binding:

90 GOLD User Guide

• The variable Population Size (or popsize) is the number of chromosomes in the population. If n_islands is greater than one (i.e. the genetic algorithm is split over two or more islands), popsize is the population on each island.

• Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).

10.3 Selection Pressure

• Each of the genetic operations (crossover, migration, mutation) (see Section 10.7, page 91) takes information from parent chromosomes and assembles this information in child chromosomes. The child chromosomes then replace the worst members of the population.

• The selection of parent chromosomes is biased towards those of high fitness, i.e. a fit chromosome is more likely to be a parent than an unfit one.

• The selection pressure is defined as the ratio between the probability that the most fit member of the population is selected as a parent to the probability that an average member is selected as a parent. Too high a selection pressure will result in the population converging too early.

• For the GOLD docking algorithm, a selection pressure of 1.1 seems appropriate, although 1.125 may be better for library screening where the aim is faster convergence.

• Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).

10.4 Number of Operations

• The genetic algorithm starts off with a random population (each value in every chromosome is set to a random number). Genetic operations (crossover, migration, mutation) (see Section 10.7, page 91) are then applied iteratively to the population. The parameter Number of Operations (or maxops) is the number of operators that are applied over the course of a GA run.

• It is the key parameter in determining how long a GOLD run will take.

• Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).

10.5 Number of Islands

• Rather than maintaining a single population, the genetic algorithm can maintain a number of populations that are arranged as a ring of islands. Specifically, the algorithm maintains n_islands populations, each of size popsize.

• Individuals can migrate between adjacent islands using the migration operator.

• The effect of n_islands on the efficiency of the genetic algorithm is uncertain.

• Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).

Page 105: Gold manual

GOLD User Guide 91

10.6 Niche Size

• Niching is a common technique used in genetic algorithms to preserve diversity within the population.

• In GOLD, two individuals share the same niche if the rmsd between the coordinates of their donor and acceptor atoms is less than 1.0 Å.

• When adding a new individual to the population, a count is made of the number of individuals in the population that inhabit the same niche as the new chromosome. If there are more than NicheSize individuals in the niche, then the new individual replaces the worst member of the niche rather than the worst member of the total population.

• Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).

10.7 Operator Weights: Migrate, Mutate, Crossover

• The operator weights are the parameters Mutate, Migrate and Crossover (or pt_cross).

• They govern the relative frequencies of the three types of operations that can occur during a genetic optimisation: point mutation of the chromosome, migration of a population member from one island to another, and crossover (sexual mating) of two chromosomes.

• Each time the genetic algorithm selects an operator, it does so at random. Any bias in this choice is determined by the operator weights. For example, if Mutate is 40 and Crossover is 10 then, on average, four mutations will be applied for every crossover.

• The migrate weight should be zero if there is only one island, otherwise migration should occur about 5% of the time.

• Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).

10.8 Van der Waals and Hydrogen Bonding Annealing Parameters

• When GoldScore is being used, the annealing parameters, van der Waals and Hydrogen Bonding, allow poor hydrogen bonds to occur at the beginning of a genetic algorithm run, in the expectation that they will evolve to better solutions.

• At the start of a GOLD run, external van der Waals (vdw) energies are cut off when Eij > van der

Waals * kij , where kij is the depth of the vdw well between atoms i and j. At the end of the run,

the cut-off value is FINISH_VDW_LINEAR_CUTOFF. This allows a few bad bumps to be tolerated at the beginning of the run.

• Similarly, the parameters Hydrogen Bonding and FINAL_VIRTUAL_PT_MATCH_MAX are used to set starting and finishing values of max_distance (the distance between donor hydrogen and fitting point must be less than max_distance for the bond to count towards the fitness score). This allows poor hydrogen bonds to occur at the beginning of a GA run.

• Both the vdw and H-bond annealing must be gradual and the population allowed plenty of time to adapt to changes in the fitness function.

• Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).

134 GOLD User Guide

GOLD Performance as a Function of Protein Type

• Success rates for GOLD as a function of protein types are given below. Statistical analysis was performed to asses whether the results are really different, or may have arisen by coincidence. This check is essential, as the size of the sets being considered here is very small.

• Performance appears to be above average for the metalloprotease, kinase, isomerase and lectin sets. However, performance seems to be lower than expected for the aspartic protease and lyase sets.

Page 106: Gold manual

GOLD User Guide 133

GOLD Performance

• A brief overview of the results obtained for GOLD with the CCDC/Astex test set are given below. Figure 1 shows GOLD success rates as a function of the number of torsion angles in the ligand. Results were obtained using the default settings; the values shown are the average values derived from a set of 50 validation runs. Standard deviations are given. RMS (Root mean squared deviations of atomic coordinates) values of 2.0 Å or less were considered to be good results.

• The following table shows the GOLD results for the clean set; results were calculated using both the default settings and the threefold speed-up settings. As can be seen, there is a tradeoff between speed and reliability. All success rates are average values over 50 validation runs. Standard deviation is given in parentheses.

92 GOLD User Guide

10.9 Hydrophobic Fitting Points

• GOLD automatically calculates a list of hydrophobic fitting points in the binding site. These are used during the generation of trial docking solutions to map hydrophobic ligand atoms into favourable regions of the binding site.

• GOLD generates its hydrophobic fitting points by placing a fine grid over the binding site. At each grid position, the van der Waals interaction energy between a bare carbon atom and the protein is evaluated. By default, positions at which the interaction energy is below -2.5 kcal/mole are added to the list of fitting points.Note: the potential and threshold for selecting fitting points can be changed by editing the gold.params file and changing the values of INTERNAL_POTENTIAL_FITPTS and E_FITPT_THRESHOLD.

• In this way, a map is constructed that contains positions onto which the placement of a hydrophobic ligand atom should be favourable.

• The ligand fitting points are used for the matching of hydrophobic regions.

• By default only carbon atoms in the ligand are considered when identifying fitting points. The selection of suitable ligand atoms can be extended to include carbon, halogen and non-polar sulfur atoms by uncommenting the following line in the gold.params file:#LIGAND_FITPTS_SELECTION EXTENDED_HAL_S

• During docking, GOLD selects a list of lipophilic ligand atoms and matches them onto a subset of the hydrophobic fitting points.

• It is possible to use customised hydrophobic fitting points. This might be appropriate if GOLD is not giving good results on a particular protein and you suspect that the fault may lie in the placement of hydrophobic ligand groups.

• Customised fitting points must be supplied in a MOL2 format file that contains a list of dummy atoms at the desired fitting-point locations. The supplied fitting points should sample all regions of interest in the cavity, so that the docking algorithm has sufficient alternatives for placement of hydrophobic ligand atoms within the cavity. GOLD uses gridded points that are spaced by 0.25 Å; for a speed-up in calculation, higher values could be used.

• To make GOLD use a customised fitting-point file, click on the Fitness & Search options button in the GOLD front end, then switch on the Read hydrophobic fitting points check box in from the Fitness Function and Search Options window. Finally, hit the Fit point file... button to open a file selection window from which your customised file can be located.

• Customised fitting points can, for example, be generated by the CCDC program SuperStar, which offers the possibility of writing out a file of GOLD fitting points in the appropriate format (see SuperStar manual sections on SAVE_GOLD_FITTING_POINTS and GOLD_MIN_PROPENSITY).

Page 107: Gold manual

GOLD User Guide 93

11. Balancing Reliability and Speed

11.1 Number of Dockings (see page 93)11.2 Early Termination (see page 93)11.3 Controlling Reliability and Speed with GA Parameters (see page 94)

11.1 Number of Dockings

• GOLD will dock each ligand several times starting each time from a different random population of ligand orientations. The results of the different docking runs are ranked by fitness score.

• The number of dockings to be performed on each ligand is set when the ligand file is defined (see Section 4.5, page 32).

• By default the number of dockings to be performed on each ligand is 10.

• The total time spent docking a ligand obviously depends on the number of docking runs, so you can make GOLD go faster by reducing this number. However, it is useful to perform at least a few docking runs on each ligand. This increases the chances of getting the right answer. Also, if the same answer is found in several different docking runs, it is usually a strong indicator that the answer is correct.

• The early termination option (see Section 11.2, page 93) can be used to prevent GOLD wasting time performing multiple docking runs on easy ligands.

11.2 Early Termination

• The early termination option instructs GOLD to terminate docking runs on a given ligand as soon as a specified number of runs have given essentially the same answer. In this situation, it is probable that the answer is correct, and GOLD will just be wasting time if it performs more docking runs on that ligand.

• To switch early termination on, click on the Allow early termination check box in the GOLD front end (i.e. so that the box is coloured red). Then specify the early termination criterion. In the example below, GOLD has been instructed to stop docking a ligand if it reaches a state in which the best three solutions found so far are all within 1.5 Å rmsd of each other:

• The rms deviation takes account of any ligand symmetry.

• Early termination does not always save as much time as you might think, because it tends to be invoked for easy (i.e. relatively rigid) ligands, which are quick to dock anyway.

132 GOLD User Guide

TABLE I. Optimal sets (clean lists) with different resolution thresholds ofnone, 2.5 Å, and 2.0 Å

Full set (305 entries)1a07 1a0q 1a1b 1a1e 1a28 1a42 1a4g 1a4k 1a4q 1a6w1a9u 1aaq 1abe 1abf 1acj 1acl 1acm 1aco 1aec 1aha1ai5 1aj7 1ake 1aoe 1apt 1apu 1aqw 1ase 1atl 1azm1b58 1b59 1b6n 1b9v 1baf 1bbp 1bgo 1bl7 1blh 1bma1bmq 1byb 1byg 1c12 1c1e 1c2t 1c5c 1c5x 1c83 1cbs1cbx 1cdg 1cf8 1cil 1cin 1ckp 1cle 1com 1coy 1cps1cqp 1ctr 1ctt 1cvu 1cx2 1d0l 1d3h 1d4p 1dbb 1dbj1dbm 1dd7 1dg5 1dhf 1did 1die 1dmp 1dog 1dr1 1dwb1dwc 1dwd 1dy9 1eap 1ebg 1eed 1ei1 1ejn 1ela 1elb1elc 1eld 1ele 1eoc 1epb 1epo 1eta 1etr 1ets 1ett1etz 1f0r 1f0s 1f3d 1fax 1fbl 1fen 1fgi 1fig 1fkg1fki 1fl3 1flr 1frp 1ghb 1glp 1glq 1gpy 1hak 1hdc1hdy 1hef 1hfc 1hiv 1hos 1hpv 1hri 1hsb 1hsl 1htf1hti 1hvr 1hyt 1ibg 1icn 1ida 1igj 1imb 1ivb 1ivc1ivd 1ive 1ivq 1jao 1jap 1kel 1kno 1lah 1lcp 1ldm1lic 1lkk 1lmo 1lna 1lpm 1lst 1lyb 1lyl 1mbi 1mcq1mcr 1mdr 1ml1 1mld 1mmb 1mmq 1mnc 1mrg 1mrk 1mts1mtw 1mup 1nco 1ngp 1nis 1nsd 1okl 1okm 1pbd 1pdz1pgp 1pha 1phd 1phf 1phg 1poc 1ppc 1pph 1ppi 1ppl1pso 1ptv 1qbr 1qbt 1qbu 1qcf 1qh7 1ql7 1qpe 1qpq1rbp 1rds 1rne 1rnt 1rob 1rt2 1sln 1slt 1snc 1srf1srg 1srh 1srj 1stp 1tdb 1tka 1tlp 1tmn 1tng 1tnh1tni 1tnl 1tph 1tpp 1trk 1tyl 1ukz 1ulb 1uvs 1uvt1vgc 1vrh 1wap 1xid 1xie 1xkb 1ydr 1yds 1ydt 1yee25c8 2aad 2ack 2ada 2ak3 2cgr 2cht 2cmd 2cpp 2ctc2dbl 2er7 2fox 2gbp 2h4n 2ifb 2lgs 2mcp 2mip 2pcp2phh 2pk4 2plv 2qwk 2r04 2r07 2sim 2tmn 2tsc 2yhx2ypi 3cla 3cpa 3erd 3ert 3gch 3gpb 3hvt 3mth 3nos3pgh 3ptb 3tpi 4aah 4cox 4cts 4dfr 4er2 4est 4fab4fbp 4lbd 4phv 4tpi 5abp 5cpp 5er1 5p2p 6abp 6cpa6rnt 6rsa 7cpa 7tim 8gch

Clean list (224 entries)1a28 1a42 1a4g 1a4q 1a6w 1a9u 1aaq 1abe 1abf 1acj1acl 1acm 1aco 1aec 1ai5 1aoe 1apt 1apu 1aqw 1ase1atl 1azm 1b58 1b59 1b9v 1baf 1bbp 1bgo 1bl7 1blh1bma 1bmq 1byb 1byg 1c12 1c1e 1c5c 1c5x 1c83 1cbs1cbx 1cdg 1cil 1ckp 1cle 1com 1coy 1cps 1cqp 1cvu1cx2 1d0l 1d3h 1d4p 1dbb 1dbj 1dd7 1dg5 1dhf 1did1dmp 1dog 1dr1 1dwb 1dwc 1dwd 1dy9 1eap 1ebg 1eed1ei1 1ejn 1eoc 1epb 1epo 1eta 1etr 1ets 1ett 1f0r1f0s 1f3d 1fax 1fen 1fgi 1fkg 1fki 1fl3 1flr 1frp1glp 1glq 1hak 1hdc 1hfc 1hiv 1hos 1hpv 1hri 1hsb1hsl 1htf 1hvr 1hyt 1ibg 1ida 1imb 1ivb 1ivq 1jap1kel 1lah 1lcp 1ldm 1lic 1lna 1lpm 1lst 1lyb 1lyl1mbi 1mcq 1mdr 1mld 1mmq 1mrg 1mrk 1mts 1mup 1nco1ngp 1nis 1okl 1okm 1pbd 1pdz 1phd 1phg 1poc 1ppc1pph 1ppi 1pso 1ptv 1qbr 1qbu 1qcf 1qpe 1qpq 1rds1rne 1rnt 1rob 1rt2 1slt 1snc 1srj 1tdb 1tlp 1tmn1tng 1tnh 1tni 1tnl 1tpp 1trk 1tyl 1ukz 1ulb 1uvs1uvt 1vgc 1wap 1xid 1xie 1ydr 1ydt 1yee 25c8 2aad2ack 2ada 2ak3 2cht 2cmd 2cpp 2ctc 2dbl 2fox 2gbp2h4n 2ifb 2lgs 2mcp 2pcp 2phh 2pk4 2qwk 2r07 2tmn2tsc 2yhx 2ypi 3cla 3cpa 3erd 3ert 3gpb 3hvt 3tpi4aah 4cox 4cts 4dfr 4est 4fbp 4lbd 4phv 5abp 5cpp5er1 6rnt 6rsa 7tim

Clean list, resolution threshold 2.0 Å (92 entries)1a28 1a4q 1a6w 1abe 1abf 1aec 1aoe 1apt 1apu 1aqw1atl 1b58 1b59 1bma 1byb 1c1e 1c5c 1c5x 1c83 1cbs1cil 1coy 1d0l 1d3h 1ejn 1eta 1f3d 1fen 1flr 1glp1glq 1hfc 1hpv 1hsb 1hsl 1hvr 1hyt 1ida 1jap 1kel1lcp 1lic 1lna 1lst 1mld 1mmq 1mrg 1mrk 1mts 1nco1phd 1phg 1ppc 1pph 1qbr 1qbu 1rds 1rnt 1rob 1slt1snc 1srj 1tmn 1tng 1tnh 1tni 1tnl 1tpp 1tyl 1ukz1vgc 1wap 1xid 1xie 2ak3 2cmd 2cpp 2ctc 2fox 2gbp2h4n 2qwk 2tmn 2tsc 3cla 3ert 3tpi 4dfr 4est 5abp6rnt 7tim

Page 108: Gold manual

GOLD User Guide 131

16.1.3 Validation using the CCDC/Astex Test Set

CCDC/Astex Validation Overview (see page 131)GOLD Performance (see page 133)GOLD Performance as a Function of Protein Type (see page 134)Influence of Mediating Water Molecules on GOLD Results (see page 135)

CCDC/Astex Validation Overview

• The CCDC/Astex test set of protein-ligand complexes was used to determine the GOLD success rates (see http://www.ccdc.cam.ac.uk/products/life_sciences/validate/). The set consists of 305 protein-ligand complexes. All complexes have had their protonation states set manually, and have been checked extensively. It is a considerably extended version of the original GOLD validation test set.

• From this set, a set of 224 reliable complexes was selected. This clean set excluded all complexes that might be unreliable. Complexes were considered to be unsuitable if they did not pass the following checks:

• Involvement of crystallographically-related protein units in ligand binding.

• Identification of bad clashes between protein side chains and the ligand.

• Presence of structural errors, and/or inconsistency of ligand placement with crystal structureelectron density.

• Limiting the clean list to resolutions better than 2.0Å left 92 entries, for which results will also be shown.

• In addition, the set has been pruned to assure diversity in terms of protein-ligand structures.

• The full list of 305, the clean list of 224, and the limited clean set of 92 entries list are shown in Table I.

94 GOLD User Guide

11.3 Controlling Reliability and Speed with GA Parameters

11.3.1 Relationship between GA Parameters and Speed (see page 94)11.3.2 Using Automatic GA Parameter Settings (see page 94)11.3.3 Using Pre-Defined GA Parameter Settings (see page 96)11.3.4 Benchmarking of Reliability/Speed for Pre-defined GA Parameter Settings (see page 97)11.3.5 GA Parameter Settings for Virtual Screening (see page 98)

11.3.1 Relationship between GA Parameters and Speed

• The time taken by GOLD to dock ligands can be controlled by altering the values of the genetic algorithm (GA) parameters (see Section 10., page 89).

• GOLD runs for a fixed number of genetic operations (crossover, migration, mutation). The easiest way to make GOLD go faster is to reduce the number of GA operations performed in the course of a run. This is done through the Number of Operations variable (this parameter is called maxops in the configuration file).

• A reduction in Number of Operations is likely to change the optimum values of several other GA parameters, particularly popsize, van der Waals and Hydrogen Bonding.

• GOLD manipulates a pool of chromosomes of size popsize * Number of Islands. The size of this pool should be such that the optimisation converges within the specified maximum number of operations, Number of Operations. If the pool size is too small for a given value of Number of Operations, the algorithm will converge prematurely. Conversely, if the pool size is too large the algorithm will terminate before it has converged.

• The annealing parameters van der Waals and Hydrogen Bonding allow poor hydrogen bonds to occur at the beginning of a genetic algorithm run, in the expectation that they will evolve to better solutions. Both the vdw and H-bond annealing must be gradual and the population allowed plenty of time to adapt to changes in the fitness function.

• Because of these factors, it is difficult to set GA parameters by hand and you are recommended to use automatic (ligand dependent) GA parameter settings (see Section 11.3.2, page 94), or one of the default parameter sets offered in the GOLD front end (see Section 11.3.3, page 96).

11.3.2 Using Automatic GA Parameter Settings

• The number of genetic operations performed (crossover, migration, mutation) is the key parameter in determining how long a GOLD run will take (i.e. this parameter controls the coverage of the search space).

• GOLD can automatically calculate an optimal number of operations for a given ligand, thereby making the most efficient use of search time, e.g. small ligands containing only one or two rotatable bonds will generally require fewer genetic operations than larger highly flexible

Page 109: Gold manual

GOLD User Guide 95

ligands.

• The criteria used by GOLD to determine the optimal GA parameter settings for a given ligand include: the number of rotatable bonds in the ligand, ligand flexibility, i.e. number of flexible ring corners, flippable nitrogens, etc. (see Section 7., page 64), the volume of the protein binding site, and the number of water molecules considered during docking (see Section 3.4, page 16).

• The exact number of GA operations contributed, e.g. for each rotatable bond in the ligand, are defined in the gold.params file (see Section 6.3, page 48).

• To enable automatic GA settings, click on the Select GA Presets and Automatic Settings button in the Genetic Algorithm Parameters panel (or hit Settings in the Control panel) then, in the Settings selector window, click on Use automatic settings:

• GOLD runs for a fixed number of genetic operations, limiting this number will result in an increase in docking speed, however the search space will be less well explored (see Section 11.3, page 94). The Search efficiency can be used to control the speed of docking and the predictive accuracy (i.e. the reliability) of the results. With the Search efficiency set at 100% GOLD will attempt to apply optimal settings for each ligand. For a ligand with five rotatable bonds this will be around 30,000 GA operations. If the Search efficiency were set to 50%, then GOLD will perform around 15,000 operations thereby speeding up the docking by a factor of two. Similarly, by setting a Search efficiency greater than 100%, it is possible to make the search more exhaustive (but slower).

• The Minimum number of operations in run will be updated automatically according to the Search efficiency that is set. The automatic preset can be overridden to ensure that every ligand is subjected to at least a user-specified number of operations. Similarly, The Maximum number of

130 GOLD User Guide

genetic algorithm parameters.

Results:

• GOLD failed to produce an answer for 1ACL because the ligand contains no hydrogen-bonding atoms (this problem is since fixed). The subsequent analysis was therefore based on results for 99 complexes.

• In summarising the results, the GOLD prediction is defined as the best of the 20 dockings according to the GOLD fitness score and not the docking that is closest to the experimental result.

• Each GOLD prediction was assigned to one of 4 subjective categories: good, close, errors or wrong. Each prediction was also ranked by its rms with respect to the observed ligand position.

• GOLD achieved a 71% rate of successful predictions (good or close).

• 3D plots of individual predictions are available on the CCDC web page.

• Detailed tabulations of the predictions are in Appendix C: GOLD Predictions in First Series of Validation Tests (see page 153).

16.1.2 Follow-Up Validation of Docking Results

• The GOLD algorithm was improved in various ways following the first set of validation tests. A second set of tests was then performed on 34 additional complexes in order to ensure that GOLD had not been over-trained on the original set. The method used was the same as in the first set of validation tests.

Results:• GOLD achieved a 74% rate of successful predictions (good or close).

• 3D plots of individual predictions are available on the CCDC web page.

• Detailed tabulations of the predictions are in Appendix D: GOLD Predictions in Second Series of Validation Tests (see page 160).

Page 110: Gold manual

GOLD User Guide 129

16. Accuracy of Predictions

16.1 Correlation between Predicted and Observed Ligand Positions (see page 129)16.2 Correlation between Fitness Function and Biological Activity (see page 137)

16.1 Correlation between Predicted and Observed Ligand Positions

• NOTE: This section and Appendix B summarise validation tests done when GOLD was first developed using the GoldScore fitness function. Recently (2001-2), we have significantly expanded the size of the test set and done comparisons between GoldScore and ChemScore. The new validations do not change the basic conclusions outlined below in any major way and give preliminary indications that GoldScore and ChemScore have comparable overall success rates.

• A simple test of the effectiveness of a docking program is to take a protein-ligand complex from the Protein Data Bank and extract the ligand. The docking program can then be used to predict the binding mode of the ligand and a comparison made with the crystallographically observed position. This methodology has been used to validate GOLD. Tests were done in two phases: first, on a test set of 100 complexes; later, on an additional 34 complexes as a check against over-training.

16.1.1 Initial Validation of Docking Results (see page 129)16.1.2 Follow-Up Validation of Docking Results (see page 130)16.1.3 Validation using the CCDC/Astex Test Set (see page 131)16.1.4 Examples of GOLD Dockings (see page 136)

16.1.1 Initial Validation of Docking Results

• The method used for each test calculation was as follows:

• 100 protein-ligand complexes were selected from the Protein Data Bank.

• Parts of the protein remote from the binding site were deleted. Enough of the protein was retained to ensure that all residues were present that might reasonably interact with the ligand.

• The ligand was extracted from the protein binding site.

• Hydrogen atoms were placed on both the protein and the ligand in order to ensure that ionisation and tautomeric states were defined unambiguously. This involved making hypotheses about the protonation states of residues such as His, Glu and Asp.

• The ligand was minimised into a low-energy conformation.

• The atom types of both the protein and ligand were checked for accuracy.

• In almost all test runs, all water molecules were deleted from the protein structure. This is not strictly defensible since water molecules often mediate protein-ligand binding. However, if more careful judgements were made on which waters to remove, the effect would be to improve the accuracy of the GOLD predictions. Hence, the deletion of all waters is a conservative strategy which will make GOLD look less reliable than it really is, rather than more reliable.

• 20 docking runs were performed on each test complex, using the slowest default setting of the

96 GOLD User Guide

operations in run can be set manually.

• When using automatic GA parameter settings, the parameters controlling the precise operation of the genetic algorithm (population size, selection pressure, Niche size, etc.) will be set to auto in the Genetic Algorithm Parameters panel. The actual GA settings used will be reported in the ligand log file (see Section 14.10, page 118).

11.3.3 Using Pre-Defined GA Parameter Settings

• To use one of the pre-defined GA parameter settings click on the Select GA Presets and Automatic Settings button in the Genetic Algorithm Parameters panel, or hit Settings in the Control panel, to open the Settings selector window:

• Select Choose presets and choose from one of the pre-defined GA parameter settings listed.

• The Default settings deliver high predictive accuracy but are relatively slow. Default settings are recommended for use with large highly-flexible ligands, or for research applications where speed of docking is not an issue and optimal accuracy is required.

• The 2 times speed-up or 3 times speed-up settings are progressively quicker (predictive reliability will fall off, but quite slowly). These setting are recommended for use with compounds containing up to six flexible bonds and/or ring corners (see Section 7.1, page 64).

• The 7-8 times speed-up settings will give comparable predictive accuracy to the slow, Default settings when docking small ligands. These settings are recommended for use with ligands containing one or two rotatable torsions and for virtual screening work (see Section 11.3.5, page 98).

Page 111: Gold manual

GOLD User Guide 97

• It is possible to create your own default GA settings. To do this, you must edit the file gold_preferences (see Section 15.4, page 127)

• Individual GA parameter settings can be specified in the GOLD front end by typing directly into the input boxes in the Genetic Algorithm Parameters panel (see Section 2.4, page 7). However, it is recommended that you use one of the pre-defined GA parameter settings as opposed to altering individual GA parameters, because the optimum values of the parameters are highly correlated.

11.3.4 Benchmarking of Reliability/Speed for Pre-defined GA Parameter Settings

• We have performed a great many experiments with different genetic algorithm (GA) settings. Three such settings are summarised below:

• We used GOLD with each of these settings to dock 100 ligands into their binding sites, using a test set of 100 protein-ligand complexes selected from the PDB. 20 docking runs were done on each ligand with each GA set. The rms deviations were computed between the experimental result and the GOLD solution ranked top by fitness function. Root mean square deviations (rmsd) were also calculated between the experimental result and the closest of the 20 dockings (i.e. not necessarily the top-ranked solution). Results were:

GA Parameter Set A Set B Set C

Number of Operations 100000 10000 1000

Population Size 100 100 50

Selection Pressure 1.1 1.1 1.125

Number of Islands 5 1 1

Crossover 95 100 100

Mutate 95 100 100

Migrate 10 0 0

Niche Size 2 2 2

Hydrogen Bonding 2.5 2.0 5.0

van der Waals 4.0 10.0 10.0

128 GOLD User Guide

• Edit into the file a line such as:

default_ga_setting /home/golduser/configfiles/myconfig.conf my protein

and create a configuration file (called /home/golduser/configfiles/myconfig.conf in the above case) containing the desired GA settings.

• The settings will appear in the Settings Selector window next time GOLD is opened:

Page 112: Gold manual

GOLD User Guide 127

15.2 Customising Fitness Function Parameters

• GOLD parameters are stored in the gold.params file in the GOLD distribution directory. It can be customised by copying it, editing the copy, and instructing GOLD to use the edited file.

• Parameters specific to GoldScore are stored in files of the type goldscore.p450_<csd|pdb>.params (see Section 6.3, page 48).

• The ChemScore fitness-function parameters are stored in the ChemScore file, which can also be customised (see Section 6.5, page 58).

15.3 Customising the Torsion Angle Distribution File

• It is possible to customise torsion distribution information by copying one of the standard torsion distribution files, editing it, and instructing GOLD to use the edited file (see Section 9.3, page 84).

15.4 Creating Customised Default Genetic Algorithm Parameter Settings

• A number of pre-defined genetic algorithm (GA) settings are offered when GOLD is opened:

• It is possible to add your own default GA settings to this window.

• To do this, you must edit the file .gold_preferences in your home directory. This file will be created the first time you run GOLD, and will look something like this:

98 GOLD User Guide

• rmsd < nÅ = number of predictions out of the 100 within nÅ rmsd of observed result.

• In the GOLD front-end, the GA parameter set called Default settings corresponds to Set A above; 7-8 times speed-up corresponds to Set B; and library screening settings corresponds to Set C.

• For careful work, we recommend the slow standard setting A, which typically finds correct solutions in 70-80% of cases. Set C, which is fast enough for virtual library-screening, is inevitably less accurate, but still finds the correct solution 60-70% of the time.

11.3.5 GA Parameter Settings for Virtual Screening

• Existing GOLD users may have library screening settings available as one of the default preferences. However, due to general advances in processor speed we would now recommend using 7-8 times speed-up for virtual screening work in order to take advantage of the associated improvement in accuracy (see Section 11.3.1, page 94):Note: If library screening settings are not available as a default preference you can re-enable these by editing the gold_preferences file (see Section 15.4, page 127).

top-ranked;rmsd < 2Å

top-ranked; rmsd < 3Å

closest;rmsd < 2Å

closest; rmsd < 3Å

Set A 70 79 83 88

Set B 64 77 79 89

Set C 62 68 72 86

Page 113: Gold manual

GOLD User Guide 99

12. Running GOLD

12.1 Required Input Files (see page 99)12.2 Starting GOLD (see page 99)12.3 Running Interactively; Interactive Diagnostics (see page 100)12.4 Submitting a GOLD job to the Background from the Front End (see page 100)12.5 Running GOLD from the Command Line (see page 100)12.6 Running in Parallel (see page 101)

12.1 Required Input Files

• The following files must be available before a GOLD job can be run:

• One or more files containing the ligand(s) to be docked, in MOL2, MOL, SD or PDB format (but PDB format is not recommended for ligand files) (see Section 4., page 30).

• A file containing the protein (or the part of a protein) into which the ligand is to be docked. This may be in PDB or MOL2 format (see Section 3., page 9)

• GOLD also needs a configuration file, which contains the names of the protein and ligand files, and all the user-defined parameters such as genetic algorithm parameter settings, fitness flags, etc. The configuration file can be created manually, but it is usually easier and preferable to create it with the GOLD graphical front end (the file is written automatically when the Run, Save & Exit or Submit & Exit buttons are hit) (see Section 2.1, page 3).

• In addition, GOLD uses a parameter file (see Section 6.3, page 48) and (optionally) a torsion distribution file (see Section 9., page 83). If the ChemScore fitness function is selected, it will also use a ChemScore file (see Section 6.5, page 58). All these files are supplied in the GOLD distribution and, by default, will be found automatically by the program. If required, any of the files can be copied to a user’s directory and edited, and GOLD can then be directed to use the edited file.

12.2 Starting GOLD

• GOLD opens output log files so each GOLD run should be performed in a separate directory. Create a directory in which to run GOLD and copy the protein and ligand files into it.

• You can also write each set of ligand output files to its own sub-directory.

• GOLD can be run from the command line or via the graphical front end. The easiest way to get started is to use the front end (see Section 2., page 3).

• From the front end, you can run a GOLD job interactively (see Section 12.3, page 100), submit it to the background (see Section 12.4, page 100), or save the configuration file so that GOLD may be started from the command line (see Section 12.5, page 100).

126 GOLD User Guide

15. Saving and Reusing Program Settings

15.1 Saving and Re-using Program Settings in Configuration Files (see page 126)15.2 Customising Fitness Function Parameters (see page 127)15.3 Customising the Torsion Angle Distribution File (see page 127)15.4 Creating Customised Default Genetic Algorithm Parameter Settings (see page 127)

15.1 Saving and Re-using Program Settings in Configuration Files

• The configuration file is a text file which specifies the GOLD calculation that is to be run, including details of the ligand, the protein binding site, the fitness-function parameter file to be used, the torsion distribution file to be used, and the genetic algorithm parameters. Although the file can be generated with a standard text editor, the easiest way to create it is to use the GOLD front end (see Section 2.1, page 3).

• Any settings that have been defined in the GOLD front end can be saved as a configuration file by selecting the button Save & Exit. Alternatively, the file will be saved automatically if you start a GOLD job from the front end with the Submit & Exit or Run buttons.

• By default, the configuration file will be saved in the directory from which GOLD was opened and will be called gold.conf. Use the entry box next to the Configuration File button to change the file name and/or directory (any file name can be used).

• Once a configuration file has been created, it can be re-used, either as a quick way of reading program settings into the GOLD front end or to run GOLD from the command line (see Section 12., page 99).

• To load a previously created configuration file into the front end, enter the file name into the box next to the Configuration File button and hit return. The parameters read in from the configuration file will overwrite any parameters that have already been set in the GOLD front end.

• If you have a valid configuration file (i.e. one that completely specifies a GOLD job), you can run GOLD from the command line by using a simple command available in $GOLD_DIR/bin. For example, if the configuration file is gold.conf, the command is:

% gold_auto gold.conf &

• If you find yourself using a configuration file over and over again, you may want to add it to the options listed in the GOLD start-up window (the Settings Selector window). This is done by editing the file .gold_preferences in your home directory (see Section 15.4, page 127).

Page 114: Gold manual

GOLD User Guide 125

• Note: SGI users running IRIX will also be given the option to use grommitt for simple visualisation of docking results (see Section 18.1, page 142).

14.14 Exporting Fitness-Function Data to SILVER

• It is possible to write additional information to docked solution files.

• This information includes the values of the individual fitness-function components and is written to SD file tags; for MOL2 files, these tags are written to comment blocks (see Section 14.2, page 111).

• This information can be utilised by SILVER (supplied with GOLD). SILVER allows you to define and calculate a wide variety of descriptors (parameters that describe dockings) which may be used to analyse the results of a docking run. For further information, refer to the SILVER User Guide.

100 GOLD User Guide

12.3 Running Interactively; Interactive Diagnostics

• GOLD can be run interactively by hitting the Run button in the front end. However, since docking often takes several minutes or even hours, it is usually better to run the job in the background.

• If GOLD is run interactively, output that is written to the log files is also displayed in a window:

• The parallel version only gives a summary as it is not possible to track multiple files.

• You can use the Interrupt GA button to interrupt GOLD and terminate the docking run.

• If any error conditions are encountered, they will be displayed in another window. Note that only fatal errors are reported for the parallel version.

• When GOLD is being run interactively, SILVER can be used to display the current top solution from a genetic algorithm run (see Section 18.1, page 142). To do this, click on the Display options button in the GOLD front end.

12.4 Submitting a GOLD job to the Background from the Front End

• You can submit a GOLD job the background by using the Submit&Exit button in the front end, having first specified all the required information, such as protein and ligand file names, parameter settings, etc.

12.5 Running GOLD from the Command Line

• Unix platforms:

• GOLD can be run directly in the background by using a simple command available in:$GOLD_DIR/bin:

Page 115: Gold manual

GOLD User Guide 101

% gold_auto gold.conf &

where gold.conf is the name of a configuration file.

• Windows:

• GOLD can be run on Windows by starting a command prompt, navigating to the directorycontaining the gold.conf file and running the following command:

"C:\ProgramFiles\CCDC\gold_v3.1\gold\d_win32\bin\gold_win32.exe"

• The above command assumes that GOLD is installed in the default installation directory andthat the configuration file is called gold.conf. If another name has been used for the gold.conf,(e.g. new_conf_filename.conf), this will have to be specified:

"C:\ProgramFiles\CCDC\gold_v3.1\gold\d_win32\bin\gold_win32.exe"new_conf_filename.conf

12.6 Running in Parallel

12.6.1 Parallel Virtual Machine (PVM) (see page 101)12.6.2 Using the PVM Console (see page 102)12.6.3 Diagnosis of PVM Problems (see page 103)12.6.4 Selecting and Deselecting Machines (see page 104)12.6.5 Setting the Maximum Number of Processes (see page 105)12.6.6 Using GOLD with your own PVM Installation (see page 105)

12.6.1 Parallel Virtual Machine (PVM)

• The parallel version of GOLD uses PVM (Parallel Virtual Machine) in its operation. PVM is a 3rd party public-domain library of routines that allows a program to schedule and harvest results across a network of machines and/or processors.

• PVM is supplied with GOLD for UNIX-based platforms only (parallel versions can only be run on Windows with third party applications) and allows users to distribute jobs over their network, across a virtual cluster of machines in order to harness the processing power of multiple machines concurrently.

• If PVM is not installed, GOLD disables the parallel version. There is also an option, -np, which allows you to disable the parallel version, if required:

• UNIX: $GOLD_DIR/bin/gold -np

• Windows: <InstallDir>/bin/gold -np

124 GOLD User Guide

Cluster 1: bestranking structure is gold_soln_ligand_m1_8.molCluster 2 : bestranking structure is gold_soln_ligand_m1_10.mol2Cluster 3 : bestranking structure is gold_soln_ligand_m1_4.mol2Cluster 4 : bestranking structure is gold_soln_ligand_m1_9.mol2

14.11 File Containing Error Messages

• The file gold.err lists any errors found by the program. These are generally fatal and cause the program to stop. It is a good idea to check gold.err if something goes wrong.

• Errors found by the atom-type checker are written to gold.err. If you are unsure about your atom typing you should therefore check this file. For example:

• In the parallel version, warning messages are logged in individual error files - one for each process. They are not sent back to the central parallel scheduling process.

• gold.err is line buffered so errors are logged immediately. If you are running GOLD interactively, the contents of gold.err will appear in a separate window.

14.12 Process File

• The file gold.pid records the user, host and process number of the GOLD job. It is deleted when GOLD exits. Its purpose is to stop the user running two GOLD jobs in the same directory.

• If the machine goes down, or GOLD crashes or is killed with signal 9, you will need to remove gold.pid before you can run another GOLD job in the same directory.

14.13 Viewing Docked Solutions in SILVER

• To visualise docked solutions in SILVER click on Display options, then select either Show in SILVER to view all results after a docking run has completed, or Show in SILVER now in order to visualise current results immediately.

Page 116: Gold manual

GOLD User Guide 123

• In the above example, at a clustering distance of 0.75 Å, there are four different clusters of solutions:

0.90 | 1 2 3 5 | 4 7 | 6 9 10 | 8 | files (d= 0.75 Å)

Note: Clusters are separated by the ’|’ symbol and rankings are used rather than run numbers (see Section 14.5, page 112).

• The first cluster contains four solutions ranked numbers 1, 2, 3 and 5, the bestranking structure in this cluster is ranked_structure_m#_1.mol2 which corresponds to the docked solution gold_soln_ligand_m1_8.mol2. Likewise, the second cluster contains two solutions ranked numbers 4 and 7, the bestranking structure in this cluster is ranked_structure_m#_4.mol2 which corresponds to the docked solution gold_soln_ligand_m1_10.mol2, and so on for the fourth and fifth clusters.

• Symbolic links will be generated in the output directory which will link to the top-ranked solution in each cluster:

102 GOLD User Guide

• Parallel GOLD dockings are distributed over a PVM at the ligand level such that each ligand is assigned to a particular node within the PVM and then docked. Results are returned to the PVM Master machine whilst new ligands are distributed amongst idle machines within the PVM until the GOLD job is completed.

• PVM works by using daemons. When you start PVM, a daemon will be created on the machine you are using (we will call this machine the master). You can then add further computers (which we will call slaves) to the virtual machine (see Section 12.6.4, page 104). Adding each new machine will start a slave daemon on that machine.

• You can only use each host as a member of one virtual machine. This is because a user can only have one daemon running on a given machine.

• When using GOLD with PVM, it is strongly recommended that you pick one machine as master and always use that machine for setting up and starting GOLD jobs.

• To run parallel GOLD using PVM, passwordless shell access (either RSH or SSH) must be set up between all of the machines that you wish to use in your PVM cluster. Your systems administrator should be able to set this up for you. To get PVM to work with SSH you need to set a global environment variable $PVM_RSH to ssh on all systems that you intend to use in the PVM cluster.

• PVM user manual pages can be found in $PVM_ROOT/man. For more information, see the PVM home page at http://www.netlib.org/pvm3.

12.6.2 Using the PVM Console

• The PVM software provides a command line console. Once you have set the environment variable $PVM_ROOT you can start it by typing:

$PVM_ROOT/lib/pvm

at the command line.

The setenv command in the console will generate a listing of the local environment set in PVM.

The conf command will tell you which hosts are currently present within your virtual machine.

• The PVM console allows you to add machines to and delete machines from your virtual machine using add and delete, as well as view details about PVM. If there are problems with a specific node, or machine, try the command:

add <node-name>

Page 117: Gold manual

GOLD User Guide 103

and see if it generates any useful information as to why there may be a problem.

• GOLD provides a simple interface to PVM that allows you to add machines (see Section 12.6.4, page 104); however you should use the console to remove them. If you delete them in the GOLD interface, they are just flagged as do not use. The reason for this is that we cannot guarantee that a user is not using PVM for other purposes.

• Adding a machine will not affect any other software, but deleting a machine might.

12.6.3 Diagnosis of PVM Problems

• If you are having difficulty getting PVM running correctly on your system, in the first instance please check the following:

1. Check that the environment variable

$PVM_ROOT

is set correctly and globally on all machines within the PVM cluster.

2. Check that your system temporary area is not full. We have occasionally heard of cases wherePVM could not start correctly because /tmp on the user's machine was full.

• Once you have performed these checks, you can begin diagnosing the root cause of the problem. The UNIX GOLD distribution includes a PVM diagnostics script called test_pvm.sh. To run this script, please execute the command:

$GOLD_DIR/bin/test_pvm.sh

and follow the on-screen instructions. If you are unable to interpret the information generated by this script, please send the entire output by email to [email protected] and we will diagnose any PVM problems you may have.

• Additional diagnostic information can be obtained from various files that can be found on the machines within your PVM cluster. In particular, the PVM log files are often very useful. Each daemon generates its own log file. They take the form:

/tmp/pvml.<user id>

and are generated on both the PVM master machine and the PVM slave machines. They can contain relevant information (or sometimes lack expected lines) that indicates the source of the problem.

• For example, if PVM is configured correctly you should expect to see the text line

Running on <platform type>

122 GOLD User Guide

14.10.3Identification of Different Binding Modes (Clustering of Ligand Poses)

• GOLD clusters docked solutions according to how similar the poses are in terms of their RMSd (see Section 14.10.2, page 120). A link can be generated to the top ranked solution from each distinct cluster. This can be useful in identifying different ligand binding modes. Considering solutions from different clusters is often more relevant than taking the top n ranked poses since these will often be very similar (i.e. all from the same cluster of solutions).

• Open the Output Preferences window by hitting the Output button in the GOLD front end. Then, switch on the Create links for different binding modes check-box, and specify an RMSd clustering distance (this determines how similar the poses are in each cluster of solutions). By default the clustering distance is 0.75 Å:

• A clustering report is given at the end of the ligand log file. The clusters themselves and the individual solutions within each cluster are in ranked order (i.e. the first member of the first cluster is always the top-ranked solution). For example, output from a run of 10 GA dockings may look like:

Page 118: Gold manual

GOLD User Guide 121

• In this case, solution number 4 had the largest fitness score (this solution will be in gold_soln_ligand_m#_4.mol2, which will be symbolically linked to ranked_ligand_m#_1.mol2), while solution number 3 had the worst fitness.

• The numbers in the matrix of rms deviations refer to the rankings, not the run numbers (e.g. row 1 of the above matrix refers to the solution with the best fitness score, contained in ranked_ligand_m#_1.mol2).

• Finally, the rms deviations are used as input to a hierarchical cluster analysis, using the complete linkage algorithm. Each line shows one iteration of the clustering algorithm, the distance between the clusters that were merged at that step, and the contents of the current set of clusters.

• Clusters are separated by the ’|’ symbol and rankings are used rather than run numbers. For example, the solutions ranked_ligand_m#_2.mol2 and ranked_ligand_m#_4.mol2 were merged in the first step of the following cluster analysis:

Final Ranking 4 2 5 1 3

_______________________________

RMSD Matrix of RANKED solutions

2 3 4 5

1: 4.8 4.7 5.1 10.1

2: 4.0 3.1 10.9

3: 4.1 10.4

4: 11.0

Clustering using complete linkage.

Structure ids are RANKING

Dist Clusters...

3.14 | 4 2 | 3 | 5 | 1 |

4.06 | 4 2 3 | 1 | 5 |

5.07 | 4 2 3 1 | 5 |

10.95 | 4 2 3 1 5 |

104 GOLD User Guide

somewhere in the PVM log file on the PVM master machine.

• For further information, please consult the PVM troubleshooting guide:

http://www.netlib.org/pvm3/book/node1.html

12.6.4 Selecting and Deselecting Machines

• Click on Choose machines in the GOLD front end to launch the parallel process scheduling window:

• The scheduling window allows you to select a set of machines, across which a parallel GOLD job will be distributed. A GOLD job may be distributed across multiple processors on a single machine, or across several single-processor machines, or across several multiple-processor machines.

• The process scheduler allows you to add suitable hosts into the schedule for use in docking a ligand.

• By clicking on the Add button you can add new machines to your schedule:

• Type in a host name (your administrator must install GOLD so that it knows the names of available host machines).

• For each host chosen, you need to specify a value for Number of Processes. This tells GOLD

Page 119: Gold manual

GOLD User Guide 105

how many separate docking runs to start on that machine. For single processor machines, Number of Processes should usually be set to 1; on machines with more than one processor, it should usually be greater than one, depending on how many of the machine’s processors you wish to use.

• The Host file name button allows you to read a file that contains a host configuration previously created when using parallel GOLD. If you click on this button, GOLD then prompts you for a file to read. It will read hosts and numbers of processes from this file, and attempt to add these hosts to your configuration.

12.6.5 Setting the Maximum Number of Processes

• The entry box labelled Maximum number of distributed processes allows specification of the maximum number of GOLD processes that can run simultaneously. This should normally be set equal to the number of processors available for the GOLD job to run on.

• Note: If the maximum number of distributed processes is set to a number greater than the total no. of processes listed for each individual host in the PVM configuration, GOLD will spawn more jobs than specified on each machine until the total no. set in the maximum number of distributed processes are being run. i.e. a discrepancy between the no. of processors listed and the maximum number of processes can lead to more or less processes than intended being run on each machine.

12.6.6 Using GOLD with your own PVM Installation

• In some circumstances, users may prefer to run parallel GOLD using a pre-existing installation of PVM rather than the version packages within the UNIX GOLD installer. However, this can cause difficulties since the parallel components of GOLD are compiled against the version of PVM packaged with GOLD using specific compiler flags.

• If the user’s version of PVM is significantly different, parallel GOLD may not function correctly in its default configuration. The solution is for the user to re-compile the PVM parts of GOLD on their system. For this reason, the UNIX GOLD distribution is packaged with a tar-gzip patch file for the PVM part of GOLD on their system. It also recompiles the front end and the PVM shared object used in the main GOLD process.

• If you would like to try recompiling the parallel components of GOLD on your own system, you will find the required patch file here:

$GOLD_DIR/gold_pvm_patch.tar.gz

Please unpack this file and consult the ReadMe file for further details.

120 GOLD User Guide

14.10.2Comparison of Docking Solutions

• Following the completion of all docking runs on a ligand, the results from the different runs are compared in the ligand log file.

• The file will include a matrix of rms deviations between the various docked ligand positions. The rms deviation algorithm takes account of symmetry effects, using a graph isomorphism algorithm. For example:

Page 120: Gold manual

GOLD User Guide 119

• The progress of each docking run (see Section 14.10.1, page 119).

• A comparison of the various docking solutions found (see Section 14.10.2, page 120).

• Clustering of ligand poses, for identification of solutions with different binding modes (seeSection 14.10.3, page 122).

• You can choose not to save ligand log files if you prefer (see Section 14.1, page 109).

14.10.1Information on the Progress of Docking Runs

• As each docking run is performed on a ligand, the progress of the genetic algorithm is recorded in the ligand log file.

• The best (most fit) individual at any time is listed. The total fitness and its component terms are also displayed.

• For GoldScore, the internal vdw energy includes the ligand torsional energy. The external vdw energy is normally scaled by a factor of 1.375 and summed with the other components to give the total fitness (this is to encourage hydrophobic contact between the protein and ligand).

• During a docking run, the fitness score may appear to get worse as the docking proceeds. This is due to the fact that the effects of poor H-bond geometry and close nonbonded contacts are artificially down-weighted at early stages of the docking (annealing). Only the final fitness score (i.e. from the completed docking) has any meaning.

• The message Reordering... refers to a re-ranking of the GA populations caused by the annealing process.

• At the end of the GA run, the solution is output and summarised.

• Here is an example output:

106 GOLD User Guide

13. Rescoring• Different scoring functions may perform better for selected cases. You may find, for example,

that ChemScore outperforms GoldScore in ranking actives or one protein class, whereas the reverse will apply for other classes.

• Therefore, when screening large numbers of compounds, rescoring docking poses with alternative scoring functions and considering the best results from each (consensus scoring) can have a favourable impact on the overall rank ordering of ligands.

13.1 Rescoring Overview (see page 106)13.2 Setting Up a Rescoring Run (see page 106)

13.1 Rescoring Overview

• It is possible to rescore a single ligand or a set of ligands in one or more files.

• Typically, a user will rescore GOLD solution files with an alternative scoring function. However, it is also possible to score a known ligand pose from an alternative source (for example, from a known crystal structure or a solution from another docking program).Note: when docking from a source other than a GOLD solution file it will not be possible to use the optimised positions of polar protein hydrogen atoms (see Section 13.2, page 106).

• Rescoring, like docking, requires a prepared protein input file and a fully defined binding site (preferably the same definition that was used for the original docking). The ligand file, scoring function and output preferences must also all be specified (see Section 13.2, page 106).

• GOLD can perform a local optimisation of the ligand conformation that is to be rescored. This is important because if the pose is tweaked only slightly (via a simple minimization in an appropriate force field) one finds that the fitness score can greatly increase.

• When rescoring a GOLD solution file is it possible to use the positions of the rotatable protein hydrogens that were generated during the original docking as a starting point for the minimisation. If these are not available then the default hydrogen atoms positions specified in the protein input file will be used.

• Rescored solution files can be written out that will contain the new scoring function terms and can be used with SILVER (see Section 13.2, page 106).

• It is not possible to use the rescore feature if GOLD is being run in parallel (see Section 12.6, page 101).

13.2 Setting Up a Rescoring Run

• Rescoring requires essentially the same information as a normal docking run. You will therefore need to:

• Provide a prepared protein input file (see Section 3.10, page 29).

• Define the binding site (preferably the same definition that was used for the original docking),i.e. you must specify the approximate centre and extent of the binding site (see Section 13.1,

Page 121: Gold manual

GOLD User Guide 107

page 106).

• Use the ligand selection dialog to specify the ligand file you wish to rescore.Note: When the Rescore check-box is switched on, the ligand selection dialog will contain anadditional option. Hit the Add all solutions in directory button to automatically add all GOLDsolution files (i.e. all files named gold_soln_*) in the specified directory to the CurrentLigand File Selection.

• Specify the fitness function to be used for the rescoring (see Section 6.1, page 46).

• Switch on the Rescore check-box in the Fitness Function Settings section of the GOLD front-end. To specify the settings to be used for the rescoring run hit the Options button. This will open the Rescoring Settings dialog:

• The following Calculation Options are available:

• Perform local optimisation (simplexing)Enable this check-box to minimise the docked ligand pose before rescoring. Simplexing isimportant if you are to obtain meaningful scores. Due to the nature of scoring functions, onefinds that small changes in location or conformation of the pose can have large effects on thecalculated score.. Note: simplexing can also affect rotatable protein hydrogen atoms (seeSection 14.6, page 115).

• Retrieve rotatable H positions from file if availableWhen rescoring a GOLD solution file it is possible to use the optimised positions of the polarprotein hydrogen atoms that were generated during the original docking (see Section 14.6,page 115). If this option is not switched on (or no rotatable H positions are available) then thedefault hydrogen atoms positions specified in the protein input file will be used.

118 GOLD User Guide

For example, if concatenated_output = Myfile.mol2 the log file will be named Myfile.rescore.log.

• For each rescored ligand a total fitness score and the component scoring terms are listed.

• Status gives an indication of whether or not there were any errors during the rescoring run.

• Simplex indicates whether or not a locally optimised ligand pose was used for the rescoring. “1” indicates that the minimised pose was used, “0” indicates that the minimised pose was not used and “-” indicates that simplexing was not switched on (see Section 13.2, page 106).Note: When Perform local optimisation (simplexing) is switched on the minimised conformation will only be used for the rescoring if this results in an improvement to the fitness score.

• When a minimised ligand pose is used for the rescoring an RMSd measure is given of the final minimised orientation with respect to the input ligand conformation.

• The example file below was generated by rescoring the best solution found (m2) for the second ligand in the solution file results.mol2:

14.9 Protein Log File

• The protein log file gold_protein.log details the parameterisation of the protein and the determination of the binding site.

• The cavity volume, as determined by the cavity detection algorithm, can also be output to the gold_protein.log file (see Section 3.8, page 24).

• The file is line buffered, so you can see how the algorithm is progressing even when GOLD is run in the background.

14.10 Ligand Log File

• The progress of each genetic algorithm run is listed in the ligand log file gold_<ligand_file_name>_m#.log. Here, m# is an index to the number of the ligand in the input file, e.g. m3 indicates that the log file refers to the third ligand in the input ligand file (remember that an input file may contain more than one ligand).

• The log files are line buffered, so you can see how the algorithm is progressing even when GOLD is run in the background.

• The parallel version of GOLD creates several temporary log files for each ligand, named gold_soln_<ligand_file_name>_m#_<N>.log where <N> is a docking-run number. Once all the docking runs for the ligand have been completed, these files are concatenated together into the single log file gold_soln_<ligand_file_name>_m#.log.

• The ligand log file contains information on:

Page 122: Gold manual

GOLD User Guide 117

14.8 Files Containing the Results of Rescoring

GOLD writes two types of file which contain the results of a rescoring run:

• A structure file containing the docked ligand pose after rescoring (see Section 14.8.1, page 117)

• A log file containing the scoring function terms obtained for the rescoing run (see Section 14.8.2, page 117)

14.8.1 Rescore Solution File

• A file containing the docked ligand solution(s) after rescoring can be written. You can control whether or not this file is written from within the Rescoring Settings window (see Section 13.2, page 106).

• If specified, solutions will be written with the default filename rescore.mol2 (MOL2 or SD output can be selected (see Section 14.5, page 112)). To specify an alternative filename (for both the rescore solution and log files), add the following line to the gold.conf file:

concatenated_output = <filename.mol2>

For example, if concatenated_output = Myfile.mol2 the rescore mol2 file will be named Myfile.mol2.

• Solution files will contain the new scoring function terms and the positions of rotatable protein hydrogen atoms generated during rescoring (see Section 13.2, page 106).

• A full description of the additional tags written to solution output files is available in Appendix B: Additional Tags in Output Files (see page 151).

14.8.2 Rescore Log File

• The rescore log file rescore.log summarises the outcome of the rescoring run. To specify an alternative filename (for both the rescore solution and log files), add the following line to the gold.conf file:

concatenated_output = <filename.mol2>

108 GOLD User Guide

• The following Output options are available:

• Write structures to file for SILVEREnable this check-box to write out docked ligand solutions after rescoring. Solutions will bewritten to the file rescore.mol2 (to specify an alternative filename (see Section 14.8.1,page 117), MOL2 or SD output can be specified (see Section 14.5, page 112)). Solution fileswill contain the new scoring function terms and can be used with SILVER.Note: If writing of this file is switched off, only the rescore.log file will be written (seeSection 14.8, page 117).

• Replace relevant tags in fileWhen rescoring a GOLD solution file enable this check-box to overwrite the list of activeresidues and the rotated protein hydrogen atom positions generated during the originaldocking with those resulting from the rescoring run. If you select not to replace relevant tagsthen rescore.mol2 will contain both the binding site definition of the original dockingand that of the subsequent rescoring run.

• Hit Done to close the Rescoring Settings dialog and start the GOLD job in the usual way (see Section 12., page 99).

• Output that is written to the rescore.log file is also displayed in the GOLD Output window. Note: To specify an alternative rescore log filename (see Section 14.8.2, page 117).

Page 123: Gold manual

GOLD User Guide 109

14. Output Options

14.1 Controlling the Amount of Output (see page 109)14.2 Controlling the Information Written to Output Files (see page 111)14.3 Specifying Directories for Output Files (see page 112)14.4 Files Containing the Initialised Protein and Ligand (see page 112)14.5 Files Containing the Docked Ligand(s) (see page 112)14.6 Files Containing Protein Binding-Site Geometry (see page 115)14.7 Files Containing Fitness Function Rankings (see page 115)14.8 Files Containing the Results of Rescoring (see page 117)14.9 Protein Log File (see page 118)14.10 Ligand Log File (see page 118)14.11 File Containing Error Messages (see page 124)14.12 Process File (see page 124)14.13 Viewing Docked Solutions in SILVER (see page 124)14.14 Exporting Fitness-Function Data to SILVER (see page 125)

14.1 Controlling the Amount of Output

• GOLD can produce a lot of output and you may wish to cut it down.

• To do this, hit the Output... button in the GOLD front end to open the Output Preferences window.

• Use the File and Format Options to specify whether you want files listing fitness-function rankings (see Section 14.7, page 115), ligand log files (see Section 14.10, page 118), and/or links for different binding modes (see Section 14.10.3, page 122). For example, the settings below will produce log files but not ranking files or links for different binding modes:

• Use the Selecting Docked Solutions options to specify whether you want to save:

• All docking solutions:

116 GOLD User Guide

gold_soln_ligand_file_m5_8.mol2, which is symbolically linked to ranked_ligand_file_m5_2.mol2, since it is the second best of the docking attempts for this molecule:

• You can choose not to save ligand rnk files if you prefer (see Section 14.1, page 109).

14.7.2 File Containing Ranked Fitness Scores for a Set of Ligands

• A file called bestranking.lst is written for batch jobs on multiple ligands. This gives a continuous summary of the best solution that has been obtained for each completed ligand.

• To specify an alternative filename, add the following line to the gold.conf file:

bestranking_list_name = <filename.lst>

• The file gives total fitness scores and a breakdown of the fitness into its constituent energy terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand), an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand intramolecular). The external vdw term is scaled by a factor of 1.375 in constructing the total fitness score (this is an empirical correction to encourage protein-ligand hydrophobic contact).Note: by default the file will contain a single internal energy term S(int) which is the sum of the internal torsion and internal vdw terms (see Section 6.2, page 46).

• The example file below was generated from a ligand input file containing 5 ligands. The listed file names correspond to the names of the files containing the best solution found for each ligand, e.g. gold_soln_ligs_m1_3.mol2 contains the best answer found for the first ligand in the input file.

Page 124: Gold manual

GOLD User Guide 115

N-phosphonacetyl-L-aspartate

the line SET_UNIQUE_SOLN_TITLES = 0 in the gold.params file should be changed to read SET_UNIQUE_SOLN_TITLES = 1.

• A description of the various other tags available can be found in Appendix B: Additional Tags in Output Files (see page 151).

14.6 Files Containing Protein Binding-Site Geometry

• During docking, GOLD will keep the protein geometry fixed except that it will optimise hydrogen-bond geometries by rotating groups such as serine OH and lysine NH3. This means

that the coordinates of polar hydrogen atoms such as these will change.

• Files can be written out that contain the conformation of the cavity residues around the docked ligand (and, specifically, the optimised positions of the protein H-bonding hydrogen atoms) for each docking. To do this, you need to edit the gold.params file and add the commandSAVE_CAVITY = 1.

• The optimised positions of polar protein hydrogen atoms that are generated during docking can also be written to the docked solution file. This information can be written to SD file tags; for MOL2 files, these tags are written to comment blocks (see Section 14.2, page 111).

14.7 Files Containing Fitness Function Rankings

• GOLD writes two types of file which summarise the fitness-function scores of docked ligands:

• One pertains to an individual ligand (see Section 14.7.1, page 115).

• The other pertains to a set of ligands (see Section 14.7.2, page 116).

14.7.1 File Containing Ranked Fitness Scores for an Individual Ligand

• A file called <ligand_file_name>_m#.rnk is written for each ligand (m# refers to the position of the ligand in the input file - remember that a given ligand input file may contain more than one ligand). This file contains a summary of the fitness scores for all the docking attempts on that ligand. The docking attempts are listed in decreasing order of fitness score, so the best solution is placed first.

• The file gives total fitness scores and a breakdown of the fitness into its constituent energy terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand), an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand intramolecular). The external vdw term is scaled by a factor of 1.375 in constructing the total fitness score (this is an empirical correction to encourage protein-ligand hydrophobic contact).

• The example file below corresponds to the five ligand in the input file ligand_file.mol2 and is therefore called ligand_file_m5.rnk. The solution Mol No 8 corresponds to the file

110 GOLD User Guide

• or just the n best solutions for each ligand, where n is a user-specified number (e.g. n = 5 inthe screenshot below):

• or just the top solution, and for only those m ligands with the best fitness scores, where m isuser specified (e.g. m = 100 in the example below):

• In addition, you can filter out all solutions with fitness scores lower than a specified value by switching on the button labelled Reject solutions with fitness lower than and typing in the required value. For example, the settings below will save a maximum of 3 solutions for each ligand and will not keep any solution with a fitness lower than 50:

Page 125: Gold manual

GOLD User Guide 111

14.2 Controlling the Information Written to Output Files

• It is possible to write additional information to docked solution files. This information is written to SD file tags; for MOL2 files, these tags are written to comment blocks.

• For post-processing docking results with SILVER it is particularly important that the scoring function terms and the rotated protein hydrogen atom positions are saved.

• Hit the Output... button in the GOLD front end to open the Output Preferences window. Use the Information in File options to control what information is written to docked ligand files (see Section 14.5, page 112).

• The following options are available:

• Save lone pairs in filesSome 3rd-party programs have difficulty reading files which contain lone pairs. You can stopGOLD including lone pairs when it writes docked solution files by switching off this check-box.

• Save rotated hydrogens in fileSILVER uses the optimised positions of polar protein hydrogen atoms that are generatedduring docking (these will usually be different for each docked ligand pose). Enable thischeck-box to save the positions of rotated protein hydrogen atoms to docked solution files.

• Save score in output fileEnable this check-box if you want the docked solution files to include the docking-scoreterms, i.e. the total GoldScore or ChemScore value for each docking, and its components suchas protein-ligand H-bond energy, internal ligand strain energy, etc.

• Output weighted SF termsCertain docking scoring function terms are the product of a term dependent on the magnitude

114 GOLD User Guide

• Output files for the docked ligand(s) may also contain additional information such as the scoring function terms and the rotated protein hydrogen atom positions specific to that solution.

• This information can be written to SD file tags; for MOL2 files, these tags are written to comment blocks. It is possible to control the information written to solution files from the Output Preferences window (see Section 14.2, page 111).

• Solution file title strings take the form

<file_basename>|<p>|[cov<r>|]dock<q>

where

• <file_basename> is the base name of the ligand input file

• <p> is the molecule number in the file

• <q> is the number of the docking

• <r> is the covalent attachment atom. This part is only printed for covalent dockings.

• For example (mol2 file):

ligand|mol2|1|dock4

where the ligand filename is ligand.mol2, the structure is number 1 in the molecule input

file, and the solution is from the fourth docking (dock4). The format for the output of the

equivalent sd input file would be the following:

ligand|sd|1|dock4

• To revert to the historic output i.e. to output only the structure name e.g.

Page 126: Gold manual

GOLD User Guide 113

• Each ligand will normally be docked several times, so a given input ligand will produce a set of

files, each containing the results of a separate docking attempt.

• Suppose that the original ligand file is structure.mol2. (this can contain more than one

ligand, in which case each will be docked). As the GOLD job progresses, the result of each

docking attempt is written out as gold_soln_structure_m#_n.mol2, where n is the

solution number 1,2,3 ... and m# is the number of the ligand, i.e. m1 for the first ligand, m2 for

the second, etc.

• Note that the file gold_soln_structure_m1_1.mol2 is not the best GOLD prediction, it

is just the solution found in the first docking attempt. However, as GOLD proceeds, symbolic

links are created: ranked_structure_m#_1.mol2 will always point to the current top-

ranked solution, ranked_structure_m#_2.mol2 will point to the second-best solution,

and so on.

• Alternatively, you can specify that all saved docking solutions for all ligands are to be

concatenated and written to a single file. To do this, open the Output Preferences dialogue by

hitting the Output... button in the GOLD front end. Then, switch on the Save solutions to one file

check-box, hit the Solutions file name button, and specify the required file name in the resulting

pop-up, e.g.

112 GOLD User Guide

of a particular physical contribution (e.g. hydrogen bonding) and a scale factor determined

e.g. by a regression coefficient. The docking scoring function terms included in the output file

can therefore consist of weighted terms, non-weighted terms or both. To include weighted

terms enable this check-box.

• Output non-weighted SF termsEnable this check-box to include non-weighted scoring function terms in the output file.

• No SD-style tags in mol2 filesEnable this check-box to prevent SD-style tags being written to comment blocks in MOL2

solution files.

14.3 Specifying Directories for Output Files

• Hit the Output... button in the GOLD front end to open the Output Preferences window.

• Use the Output directory... entry box to specify the directory to which output files will be

written.

• When more than one ligand is being docked, switch on the Create output sub-directories check

box if you want results for each ligand to be written to a separate sub-directory.

14.4 Files Containing the Initialised Protein and Ligand

• GOLD produces the following output files:

• gold_ligand.mol2 is the original ligand datafile with lone pairs added and the sets

DONOR_HYDROGENS and LONE_PAIRS defined.

• gold_protein.mol2 is the original protein datafile with lone pairs added to binding site

atoms and the sets DONOR_HYDROGENS and LONE_PAIRS defined. The binding site is

defined in the set CAVITY_ATOMS.

• Note: these set-definitions in the gold_protein.mol2 file are only accessible (i.e. visible)

through SYBYL.

14.5 Files Containing the Docked Ligand(s)

• By default, docked ligands will be written out in the same format as was used for input. To

change this, hit the Output... button in the GOLD front end to open the Output Preferences

window. Then use the File and Format Options to specify the required output format. For

example: