a novel drug for uncomplicated malaria: chemoinformatics ... · chemoinformatics methods for...
TRANSCRIPT
A Novel Drug for Uncomplicated Malaria:
Chemoinformatics Methods for Compound Selection in a High Throughput Screening
Program
Neil Berry
University of Liverpool
14th July 2010
Outline
• Introduction
Why do we need a drug against malaria?
Novel target : PfNDH2
• Chemoinformatics compound selection for HTS
Methods for compound selection
“Scoring” compounds
Filter compounds
Diversity selection
• Summary
• Acknowledgements
2
Why do we need a drug against malaria?
• Global health problem
• Best estimates put the number of clinical episodes of malaria at 0.5 billion
• >400 million drug treatments/year
Introduction
• Current therapies are failing
• Parasite resistance remains a major threat (artemisinins)
• Very few truly novel drug targets
3
Introduction• The mitochondria is proven drug target - e.g. Atovaquone in MalaroneTM
• Pf type II NADH:quinone oxidoreductase (PfNDH2) outstanding chemotherapeutic target
• Enzyme “choke point” in electron transport chain
Inhibition leads to mitochondiral dysfunction and parasite death
• PfNDH2 not found in human host
• Inhibiton of function predicted to be lethal to BOTH
asexual stage (alleviating clinical symptoms) and
sexual stages (reducing transmission)
• Resistance to PfNDH2 predicted to be difficult
• Only one selective inhibitor known (HDQ)
4
N
O
OH
• Aim : produce candidate drug targeting PfNDH2 for clinical development
• Candidate profile
Drug like properties
Acceptable costs
Potential use in non-severe disease either alone or in combination
Introduction
• Exploit PfNDH2 as a antimalarial target due to
(i) Essentiality
(ii) Selectivity
(ii) Tractability of target
(iv) Amenability to HTS
5
Introduction• HTS screen against PfNDH2 target outsourced to BioFocus DPI - four phases
Phase 1
Transfer and optimisation of NADH oxidation assay from LSTM to Biofocus DPI
Phase 2
Substructure search for quinolin-4-one Biofocus compound library
Screening compounds
Phase 3
Selection of 16000 compounds from Biofocus compound library - chemoinformatics methods
Phase 4
Screening 16000 compounds at
Single concentration
Active compounds retested in duplicate
Confirmed hits - duplicate 10 point does-response curves
6
750,000 compounds
(Biofocus DPI Library)
Substructure search
Compounds
Potency
Actives
Phase 2
Introduction750,000 compounds
(Biofocus DPI Library)
Chemoinformatics selection
16,000
HTS (primary screen)
Compounds
Phase 3
Hit confirmation
Compounds
Potency
Actives
Hits
Medicinal ChemistryTemplate Selection
Phase 4
7
Chemoinformatics
Goal
Identify several novel molecular scaffolds which inhibit PfNDH2
Identification achieved via HTS with compounds selected using chemoinformatics
Scaffolds identify
⇒ Medicinal chemistry programme
Approach
i) Screen small library of compounds (~1000)
ii) Chemoinformatics to select further compounds
iii) Screen larger library of compounds (~16000)
8
N
O
OH
O
HDQ
Molecular "Core"
Initial Screen
Chemoinformatics
N
BioFocus DPI Compound Collection(~750000 cpds) 1175 Compounds
1H-Quinolin-4-one
9
1175 CompoundsDose-response determination
(5-point, 20 mM -> 7.8 nM)
54 new active compounds(~5% hit rate)
Hit Expansion Strategy
Expand activity data and establish initial SAR by exploring the chemical space in a
variety of ways around the of active compounds
Issue
Basically, only one single active chemotype (quinolin-4-one derivatives)
Chemoinformatics
Basically, only one single active chemotype (quinolin-4-one derivatives)
Approach
Apply a range of different chemoinformatic approaches to
1) Explore the chemical space around the active hits (similarity searching)
2) Identify new active chemotypes (scaffold-hopping)
⇒ Select 16000 compounds for HTS
10
Chemoinformatics Methods
Searches
Fingerprint similarity
Turbo similarity
Substructure
Chemoinformatics
Bayesian classification model
PCA model
Scoring compounds - Lead and Drug likeness bias
Lipinski and Veber filters
Diversity selection
⇒16000 "leadlike“, diverse compounds selected via rational chemoinformatic approach for HTS
11
i) Hit Expansion: Fingerprint Similarity Searches
MDL MACCS keys
ECFP2
8784 compoundsTanimoto cutoff > 0.8
333 compounds
Chemoinformatics
54 Active compounds + HDQECFP2
FCFP2
333 compoundsTanimoto cutoff > 0.6
435 compoundsTanimoto cutoff > 0.75
12
ii) Hit Expansion:TurboSimilarity Searches
• TurboSimilarity searches improve virtual screening when there is very little information
Active compoundNearest neighbourNearest neighbour of nearest neighbour
Chemoinformatics
TurboSimilarity ECFP4
Top 250 for each search
Hert J et al, J. Med. Chem. 2005, 48, 7049 Willett P, Drug Discovery Today 2006, 11, 1046
Active compounds + HDQ 4891 compounds
Nearest neighbour of nearest neighbour
13
iii) Hit Expansion: Substructure Searches
38 Scaffold isosteres of
Quinolin-4-one
137693 Compounds
Scaffolds < 1000 hits: All kept
BioFocus DPI Compound Collection(~750000 cpds)
Chemoinformatics
kept
Scaffolds > 1000 hits: Maximum diverse sampling
5247 Compounds
• Isosteres derived from 2D topological pharmacophore searches
(atom pair, fragment, pharmacophore fingerprints, Burden numbers,
Tanimoto/Euclidean measures)
• Medicinal chemists
14
iv) Hit Expansion: Bayesian Classification Searche s
• Bayesian classification uses the probability distribution of molecular descriptors
(fingerprints and molecular properties like logP, MW,…) to classify molecules
into some set of categories (e.g. active and inactive)
• Model accuracy - ‘Excellent’ (0.924)
Chemoinformatics
Bayesian Classifier
ALogP, MW, #HBDon, #HBAcc, #RotBonds, PSA and ECFP2 fingerprints
Active and inactive compounds
4891 compounds
Bayesian model
BioFocus DPI Compound Collection(~750’000 cpds)
15
v) Hit Expansion: PCA Model Searches
Principal Component Analysis
20 Descriptors
Euclidean distanceClosest 5000 compounds to an active compound from BioFocus
Active compoundsPrincipal Component Model – PC1-3 explained 88.5% of overall variance
Chemoinformatics
active compound from BioFocusDPI Compound Collection
(~750’000 cpds)
5000 Compounds
• Descriptors:ALogP; Molecular_Weight; Num_Atoms; Num_Bonds; Num_Hydrogens; Num_PositiveAtoms; Num_NegativeAtoms; Num_RingBonds; Num_RotatableBonds; Num_AromaticBonds; Num_BridgeBonds; Num_Rings; Num_AromaticRings; Num_RingAssemblies; Num_Chains; Num_ChainAssemblies; Num_Fragments; Num_H_Acceptors; Num_H_Donors; Molecular_Solubility
16
vi) Hit Expansion: Combining Searches with “Drug lik eness” and “Lead likeness”
• Combining results from similarity approaches - 34356 compounds
• Compounds were then scored and ranked using the following function:
Score = 4 * (Sum of normalised scores/7) + 1*f(LogS) + 1*f(LogP) + 2*f(MW)
LogS f(LogS) = 1 if LogS > -5.0. f(LogS) = 0 if LogS < -6.5. f(LogS) varies linearly in between.
ALogP f(ALogP) = 1 if ALogP > -1.0 and ALogP < 4.0.
Chemoinformatics
f(ALogP) = 0 if ALogP < -2.5 or ALogP > 5.5.
f(ALogP) varies linearly from 0 to 1 between -2.5 < ALogP < -1.0
f(ALogP) varies linearly from 1 to 0 between 4.0 < ALogP < 5.5
MW f(MW) = 1 if MW < 400.
f(MW) = 0 if MW > 550.
f(MW) varies linearly in between.
17
vii) Hit Expansion: Final stages
Veber & Lipinski guidelines
(relaxed)
MW<600 LogP<6 #Hacc<11 #Hdon<6 #Rotbonds<14
PSA<150Å2 Maximally Diverse SelectionBCUT descriptors
34356 compounds 32727 “good” compounds(<2 failed rules)
Chemoinformatics
16000 Compounds
• Diversity assessment
16000 compounds (~2% BioFocus DPI Compound Collection) covers
~16% of diversity within entire BioFocus DPI Compound Collection (~750000 cpds)
18
Chemoinformaticsvii) Hit Expansion: Assessment
• Score assessment
Slight bias towards higher scoring compounds
32727 compounds 16000 compounds
DiversitySelection
750,000 compounds
(Biofocus DPI Library)
Substructure search
1,178
Potency
17 actives
Phase 2
Summary750,000 compounds
(Biofocus DPI Library)
Chemoinformatics selection
16,000
HTS (primary screen)
469
Phase 3
Hit confirmation
150
Potency
33 actives50 Hits
IC50 < 10 µMHit rate 0.29%
Phase 4
Medicinal ChemistryTemplate Selection
20
Summary
• ~1000 Molecules screened with 1H-Quinolin-4-one core
• 16000 "leadlike“, diverse compounds selected via rational chemoinformatic approach for HTS
• Only 3 of the 50 hits identified by more than one chemoinformatic method
Summary
Method Number of HitsHDQ Core 17MACCS 1FCFP2 2
• 50 Hits IC50 < 10 µM
• New scaffolds identified
• Scaffolds entering medicinal chemistry programme
21
FCFP2 2ECFP2 0
Bioisosteres 14Bayesian 8
Turbo 9PCA 2
AcknowledgementsUniversity of Liverpool
Alex Lawrenson – poster 13
Raman Sharma – poster 34
Paul O’Neill
Liverpool School of Tropical Medicine
Steve Ward
22
Steve Ward
Giancarlo Biagini
Biofocus DPI
Serge Parel
Grant