a novel drug for uncomplicated malaria: chemoinformatics ... · chemoinformatics methods for...

A Novel Drug for Uncomplicated Malaria:

Chemoinformatics Methods for Compound Selection in a High Throughput Screening

Program

Neil Berry

University of Liverpool

14th July 2010

Outline

• Introduction

Why do we need a drug against malaria?

Novel target : PfNDH2

• Chemoinformatics compound selection for HTS

Methods for compound selection

“Scoring” compounds

Filter compounds

Diversity selection

• Summary

• Acknowledgements

2

Why do we need a drug against malaria?

• Global health problem

• Best estimates put the number of clinical episodes of malaria at 0.5 billion

• >400 million drug treatments/year

Introduction

• Current therapies are failing

• Parasite resistance remains a major threat (artemisinins)

• Very few truly novel drug targets

3

Introduction• The mitochondria is proven drug target - e.g. Atovaquone in MalaroneTM

• Pf type II NADH:quinone oxidoreductase (PfNDH2) outstanding chemotherapeutic target

• Enzyme “choke point” in electron transport chain

Inhibition leads to mitochondiral dysfunction and parasite death

• PfNDH2 not found in human host

• Inhibiton of function predicted to be lethal to BOTH

asexual stage (alleviating clinical symptoms) and

sexual stages (reducing transmission)

• Resistance to PfNDH2 predicted to be difficult

• Only one selective inhibitor known (HDQ)

4

N

O

OH

• Aim : produce candidate drug targeting PfNDH2 for clinical development

• Candidate profile

Drug like properties

Acceptable costs

Potential use in non-severe disease either alone or in combination

Introduction

• Exploit PfNDH2 as a antimalarial target due to

(i) Essentiality

(ii) Selectivity

(ii) Tractability of target

(iv) Amenability to HTS

5

Introduction• HTS screen against PfNDH2 target outsourced to BioFocus DPI - four phases

Phase 1

Transfer and optimisation of NADH oxidation assay from LSTM to Biofocus DPI

Phase 2

Substructure search for quinolin-4-one Biofocus compound library

Screening compounds

Phase 3

Selection of 16000 compounds from Biofocus compound library - chemoinformatics methods

Phase 4

Screening 16000 compounds at

Single concentration

Active compounds retested in duplicate

Confirmed hits - duplicate 10 point does-response curves

6

750,000 compounds

(Biofocus DPI Library)

Substructure search

Compounds

Potency

Actives

Phase 2

Introduction750,000 compounds


Chemoinformatics selection

16,000

HTS (primary screen)

Compounds

Phase 3

Hit confirmation

Compounds

Potency

Actives

Hits

Medicinal ChemistryTemplate Selection

Phase 4

7

Chemoinformatics

Goal

Identify several novel molecular scaffolds which inhibit PfNDH2

Identification achieved via HTS with compounds selected using chemoinformatics

Scaffolds identify

⇒ Medicinal chemistry programme

Approach

i) Screen small library of compounds (~1000)

ii) Chemoinformatics to select further compounds

iii) Screen larger library of compounds (~16000)

8

N

O

OH

O

HDQ

Molecular "Core"

Initial Screen

Chemoinformatics

N

BioFocus DPI Compound Collection(~750000 cpds) 1175 Compounds

1H-Quinolin-4-one

9

1175 CompoundsDose-response determination

(5-point, 20 mM -> 7.8 nM)

54 new active compounds(~5% hit rate)

Hit Expansion Strategy

Expand activity data and establish initial SAR by exploring the chemical space in a

variety of ways around the of active compounds

Issue

Basically, only one single active chemotype (quinolin-4-one derivatives)

Chemoinformatics

Basically, only one single active chemotype (quinolin-4-one derivatives)

Approach

Apply a range of different chemoinformatic approaches to

1) Explore the chemical space around the active hits (similarity searching)

2) Identify new active chemotypes (scaffold-hopping)

⇒ Select 16000 compounds for HTS

10

Chemoinformatics Methods

Searches

Fingerprint similarity

Turbo similarity

Substructure

Chemoinformatics

Bayesian classification model

PCA model

Scoring compounds - Lead and Drug likeness bias

Lipinski and Veber filters

Diversity selection

⇒16000 "leadlike“, diverse compounds selected via rational chemoinformatic approach for HTS

11

i) Hit Expansion: Fingerprint Similarity Searches

MDL MACCS keys

ECFP2

8784 compoundsTanimoto cutoff > 0.8

333 compounds

Chemoinformatics

54 Active compounds + HDQECFP2

FCFP2



12

ii) Hit Expansion:TurboSimilarity Searches

• TurboSimilarity searches improve virtual screening when there is very little information

Active compoundNearest neighbourNearest neighbour of nearest neighbour

Chemoinformatics

TurboSimilarity ECFP4

Top 250 for each search

Hert J et al, J. Med. Chem. 2005, 48, 7049 Willett P, Drug Discovery Today 2006, 11, 1046

Active compounds + HDQ 4891 compounds

Nearest neighbour of nearest neighbour

13

iii) Hit Expansion: Substructure Searches

38 Scaffold isosteres of

Quinolin-4-one

137693 Compounds

Scaffolds < 1000 hits: All kept

BioFocus DPI Compound Collection(~750000 cpds)

Chemoinformatics

kept

Scaffolds > 1000 hits: Maximum diverse sampling

5247 Compounds

• Isosteres derived from 2D topological pharmacophore searches

(atom pair, fragment, pharmacophore fingerprints, Burden numbers,

Tanimoto/Euclidean measures)

• Medicinal chemists

14

iv) Hit Expansion: Bayesian Classification Searche s

• Bayesian classification uses the probability distribution of molecular descriptors

(fingerprints and molecular properties like logP, MW,…) to classify molecules

into some set of categories (e.g. active and inactive)

• Model accuracy - ‘Excellent’ (0.924)

Chemoinformatics

Bayesian Classifier

ALogP, MW, #HBDon, #HBAcc, #RotBonds, PSA and ECFP2 fingerprints

Active and inactive compounds

4891 compounds

Bayesian model

BioFocus DPI Compound Collection(~750’000 cpds)

15

v) Hit Expansion: PCA Model Searches

Principal Component Analysis

20 Descriptors

Euclidean distanceClosest 5000 compounds to an active compound from BioFocus

Active compoundsPrincipal Component Model – PC1-3 explained 88.5% of overall variance

Chemoinformatics

active compound from BioFocusDPI Compound Collection

(~750’000 cpds)

5000 Compounds

• Descriptors:ALogP; Molecular_Weight; Num_Atoms; Num_Bonds; Num_Hydrogens; Num_PositiveAtoms; Num_NegativeAtoms; Num_RingBonds; Num_RotatableBonds; Num_AromaticBonds; Num_BridgeBonds; Num_Rings; Num_AromaticRings; Num_RingAssemblies; Num_Chains; Num_ChainAssemblies; Num_Fragments; Num_H_Acceptors; Num_H_Donors; Molecular_Solubility

16

vi) Hit Expansion: Combining Searches with “Drug lik eness” and “Lead likeness”

• Combining results from similarity approaches - 34356 compounds

• Compounds were then scored and ranked using the following function:

Score = 4 * (Sum of normalised scores/7) + 1*f(LogS) + 1*f(LogP) + 2*f(MW)

LogS f(LogS) = 1 if LogS > -5.0. f(LogS) = 0 if LogS < -6.5. f(LogS) varies linearly in between.

ALogP f(ALogP) = 1 if ALogP > -1.0 and ALogP < 4.0.

Chemoinformatics

f(ALogP) = 0 if ALogP < -2.5 or ALogP > 5.5.

f(ALogP) varies linearly from 0 to 1 between -2.5 < ALogP < -1.0

f(ALogP) varies linearly from 1 to 0 between 4.0 < ALogP < 5.5

MW f(MW) = 1 if MW < 400.

f(MW) = 0 if MW > 550.

f(MW) varies linearly in between.

17

vii) Hit Expansion: Final stages

Veber & Lipinski guidelines

(relaxed)

MW<600 LogP<6 #Hacc<11 #Hdon<6 #Rotbonds<14

PSA<150Å2 Maximally Diverse SelectionBCUT descriptors

34356 compounds 32727 “good” compounds(<2 failed rules)

Chemoinformatics

16000 Compounds

• Diversity assessment

16000 compounds (~2% BioFocus DPI Compound Collection) covers

~16% of diversity within entire BioFocus DPI Compound Collection (~750000 cpds)

18

Chemoinformaticsvii) Hit Expansion: Assessment

• Score assessment

Slight bias towards higher scoring compounds

32727 compounds 16000 compounds

DiversitySelection

750,000 compounds


Substructure search

1,178

Potency

17 actives

Phase 2

Summary750,000 compounds


Chemoinformatics selection

16,000

HTS (primary screen)

469

Phase 3

Hit confirmation

150

Potency

33 actives50 Hits

IC50 < 10 µMHit rate 0.29%

Phase 4

Medicinal ChemistryTemplate Selection

20

Summary

• ~1000 Molecules screened with 1H-Quinolin-4-one core

• 16000 "leadlike“, diverse compounds selected via rational chemoinformatic approach for HTS

• Only 3 of the 50 hits identified by more than one chemoinformatic method

Summary

Method Number of HitsHDQ Core 17MACCS 1FCFP2 2

• 50 Hits IC50 < 10 µM

• New scaffolds identified

• Scaffolds entering medicinal chemistry programme

21

FCFP2 2ECFP2 0

Bioisosteres 14Bayesian 8

Turbo 9PCA 2

AcknowledgementsUniversity of Liverpool

Alex Lawrenson – poster 13

Raman Sharma – poster 34

Paul O’Neill

Liverpool School of Tropical Medicine

Steve Ward

22

Steve Ward

Giancarlo Biagini

Biofocus DPI

Serge Parel

Grant

a novel drug for uncomplicated malaria: chemoinformatics ... · chemoinformatics methods for...

Documents