tautomers - advanced databases for in-silico screening?
TRANSCRIPT
Tautomers - Advanced Databases for in-silico Screening ?
Frank Oellien, 18th CIC Workshop 2004, Boppard
Overview
• Motivation for tautomers in screening data sets
• Tautomer enumeration approach
• Workflow
• Examples for ‘enhanced’ Database Search
• Results for ‘enhanced’ Database Search
• Summary and Future Tasks
Biological Relevance of Tautomers
Targets: Bacteria, Arthropods, and ParasitesVariance for physiological pH expected – not static values !
Ambiguities in Proteins and Ligands
Proteins (X-Ray structures)• Flexibility (e.g. Gln, Asn)• Ionisation states (e.g. Glu, Asp)• Tautomerism (e.g. His)
Ligands (Compound Libraries)• Conformations• Ionization states• Stereo centers• Tautomers
Software: Virtual Screening
Types of Virtual Screening Software• High-throughput Docking of ligands into
protein X-Ray structures (Gold, FlexX)• DB for pharmacophore search (Catalyst,
Unity)
Current VS software applications adress:• Conformations• Ionization states• Stereo centers• Tautomers
X (exception FlexX 2004)
X
Biological Relevance of Tautomers
Tautomeric states of ligands can be relevant for biological interactions
• Derivates of tetrazole, triazole, thiazole, pyrazole, iminopyrimidine, …
• Brandstetter et al., MMP-8-InhibitorsJ. Biol. Chem. 276, 2001, 17405-12.
• Pospisil et al., Ligands of herpesviral thymidine kinases, Helvet. Chim. Acta 85, 2002, 3237-50.
Software: Tautomer Generation
Tautomer Generation Applications• Agent 2.0 (ETH Zurich- Switzerland)• QUAC PAC 1.1 (OpenEyes)• StereoPlex (Tripos)
• no extensions by the means of user-defined rules• no tautomer-sensitive duplicate check
Aim: Easily extensible and scriptable software thatallows the integration and automation of tautomergeneration in our existing screening workflow.
CACTVS: Chemical data management system
Tautomer enumeration
• C core library, Tcl command layer• Main command:
ens transform $eh $tlist <direction> <reactionmode> <flags> <overlapmode> <excludelist> <maxtautomers> <timeout>
tlist: Transformation definition - SMIRKS line notation
[#1:1][O:2][C:3]#[N:4]>>[O:2]=[C:3]=[N:4][#1:1]
• preferred tautomer forms• tautomer sensitive duplicate check• 21 pre-defined rules (up to 1,11-H-shifts)• user-defined tautomer sets
Examples
Only 2 transformation rules are needed (1,3 and 1,5-H-shift)
N
NH
NH
N
NH2
O
NH
N N
NH
NH2
O
NH
NH
N
NH
NH
O
N
NH
N
NH
NH2
O
N
N N
NH
NH2
OH
N
NH
N
NH
NH
OH
N
NH
N
N
NH2
OH
NH
N N
N
NH2
OH
NH
NH
N
N
NH
OH
NH
N NH
N
NH2
O
NH
NH
NH
N
NH
O
NH
N NH
N
NH
OH
N
N NH
N
NH2
OH
N
NH
NH
N
NH
OH
NH
N N
NH
NH
OH
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
0
200000
400000
600000
800000
1000000
1200000
1400000
Maybridge Specs TimTec VitasM AsinexPlatinum
AsinexGold
ChemDiv
no tautom ers with tautom ers
Database Expansion
x2,7
x3,5
x3,2
x3,6
x3,0
x3,6
x3,4
Tautomer Enumeration - Benchmark
Platform: SGI Fuel R1400 / 600 MHz, 1 GB RAM
Performance depends on• nature of the compounds• number of tautomers
SupplierDB Compounds/min MultiplierMaybridge Screening > 150 2,5Asinex Platinum > 250 2,9VitasM (in-hose Stock) > 560 3Tripos Leadscreen > 1400 2,2
Tautomeric Fingerprintsa) Asinex (Platinum Collection)
0
20
40
60
80
100
1 4 7 10 13 16 20 24 28 34 39 46 52 69 115No of tautomer forms
Taut
omer
freq
uenc
y
0
10
20
30
40
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21
b) Specs (Screening Collection)
0
20
40
60
80
100
1 9 17 25 33 41 49 57 65 74 84 95 108 125 149 182 210 348 742No of tautomer forms
Taut
omer
freq
uenc
y
0
10
20
30
40
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Virtual Screening Workflow @ Intervet
2D / 3DStructure DB
(MDL)
Specific 3DDatabases
(Catalyst, Unity)
TautomerGeneration
Pre-Processing
Tautomer-sensitiveDuplicate check
Data Analysis Virtual
Screening
Example I - MMP-8 (PDB Entry Code: 1JJ9)
Matrix Metalloproteinase Inhibitor: 8-Barbiturate
H. Brandstetter et al., MMP-8-InhibitorsJ. Biol. Chem.276, 2001, p 17405-12.
http://home.t-online.de/home/kubinyi/dd-18.pdf
Example I – MMP-8 Pharmacophore from X-RAY
Matrix Metalloproteinase Inhibitor: 8-Barbiturate
H-Bond Donor: greenH-Bond Acceptor: magentaHydrophobic aliphatic: blueRing Aromatic: brown
Example I – MMP-8 Testdatabase
993 molecules selected from NCI 2000 Database (Catalyst Version)
2D / 3DStructure DB
(NCI 2000)
Divers CompoundSelection (Cerius 2 4.9)
TautomerGeneration (CACTVS)
3DDatabase(Catalyst,
No Tautomers)
3DDatabase(Catalyst,
with tautomers)
1536 compounds including5 8-Barbiturate Tautomers
994 compounds including8-Barbiturate (X-RAY) *
* Knowledge of stable tautomeric form is needed as prerequisite.
Default conditions for Catalyst database building – exception: max conformers 150
Example I – MMP-8 Search on Testdatabase
Questions:• Covergage of the suggested tautomeric states for the 8-
barbiturate by our workflow ? • Conformer generation of Catalyst resemble X-Ray ?• Fit value significant for X-Ray and also conformers ?• Signal to noise relationship – fit values of other hits ?
3DDatabase(Catalyst,
No Tautomers)
3DDatabase(Catalyst,
WithTautomers)
1536 compounds including5 8-Barbiturate Tautomers
994 compounds including8-Barbiturate (X-RAY)
Exotic luxury or useful effort
?
Example I – MMP-8 Results
Best Search – no tautomers:• 29 compounds found• 8-barbiture acid found in database, but scores less significant (BestFit 3.2)• 8-barbiture tautomer form (X-ray) BestFit 7.2 (AV= 3.0 SD = 1.7) • second best scored hit show BestFit 6.2• only 3 hits score higher then BestFit 4• best non X-Ray conformer scores BestFit 6.3
Example I – MMP-8 Results
Best Search - with tautomers:• 30 compounds found (22 unique, 8 tautomeric duplicates) • 8-barbiture BestFit 7.2 (AV= 2.0 SD = 1.5)• second best scored hit show BestFit 5.5• only 3 hits score higher then BestFit 4• non tautomeric 8-barbiture scores BestFit 3.2• 60 % Overlap between both search results (all top scoring hits incommon)
Example I – MMP-8 Summary
• Significant better fit value and hit separation in case of a database search including tautomers
• X-ray structure closely resembled • Number unique hits reduced• Significant more structures have to be
converted – time consuming aspectCritical aspect:• Hit rate (unique hits) is lower for database
including tautomers (?)• X-ray structure or known physiological
conditions in the protein appear to be important for sensitive pharmacophore searches
Example II - CDK of Eimeria tenella
Cyclin Dependent Kinase – Homology Model based on Sequence Analysis
C. Beyer et. al.Oral & Poster Presentation at the 18. Darmstädter Molecular Modelling Workshop 2004
J.H. Kinnaird et al.International Journal for Parasitology 34, 2004, 683–692
Qualitative pharmacophore model derived from human CDK2 best selective inhibitors – prefilter for docking libraries
H-Bond Donor: greenH-Bond Acceptor: magentaHydrophobic: blueRing Aromatic: brown
Feature mappingof pharmacophorehypothesis with CDK2 selective molecules
Example II - CDK of Eimeria tenella - hiphop
993 molecules selected from NCI 2000 Database (Catalyst Version) plus 123 known human CDK1/2 inhibitors. 93 ligands show activity against CDK2.
3DDatabase(Catalyst,
No Tautomers)
3DDatabase(Catalyst,
WithTautomers)
2368 compounds including837 CDK1/2 inhibitor tautomers *
*733 CDK2 inhibitor tautomers
1116 compounds including123 CDK1/2 known inhibitors
Example II - CDK of Eimeria tenella - database
A) Search results without tautomers:• best hypothesis finds 55.5 % of the CDK2 known• Inhibitors (AV 39.8 % SD 9.5 %) • Best selective (selectivity higher 4 included)• Number of hits 81
B) Search results with tautomers:• best hypothesis finds 72.2. % of the CDK2 known• Inhibitors (AV 66.4 % SD 10.0 %) • Best selective (selectivity higher 4 included)• Number of unique hits 61 • Overlap of best hypo search in A) with results in B) 93 %
Example II - CDK of Eimeria tenella - Results
Example II - CDK of Eimeria tenella - Results
A) Search results without tautomers:• GH Score 0.54 for best pharmacophore model
B) Search results with tautomers:• GH Score 0.77 for best pharmacophore model of A)
• Number of true hits better in case of tautomers • High overlap among top scoring ligands for both searchesRemark: SBF models under way
Critical aspect:• Number of unique hits is reduced by using tautomer databases
• All kinds of tautomeric states are considered.
Example II - CDK of Eimeria tenella - Summary
• Modifications of tautomeric rules• Automatisation of database building workflow
• Implementation of defined Ionisations• Further investigations with examples
Future Tasks
AcknowledgementsDr. J. Cramer, C. Bayer Dr. J. Schröder, PD Dr. P. SelzerIntervet Innovation GmbH
Dr. W.-D. IhlenfeldtΧemistry GmbH
Dr. O. SacherMolecular Networks GmbH
Dr. T. HidakaTakeda Pharmaceutical Ltd.
Who we are ...
Paul SelzerRichard Marhöfer
Jörg SchröderJörg Cramer
Andreas Rohwer
BIOCHEMINFORMATICS
Carsten Beyer
Andreas Krasky
Anette Klinger
Frank Oellien
Kristin Engels Hon Tran