diversity analysis and library design -...

50
Diversity Analysis and Library Design Val Gillet Department of Information Studies, University of Sheffield, UK

Upload: votram

Post on 22-Jul-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Diversity Analysis and Library Design

Val GilletDepartment of Information Studies,University of Sheffield, UK

Page 2: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Outline• Diversity Analysis

– Measuring diversity– Selecting diverse subsets– Computational filtering

• Combinatorial Library Design– Designing libraries optimised on multiple properties

• Reduced Graphs as Molecular Descriptors

Page 3: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

High-Throughput Technologies• High-throughput screening has massively increased the

rate at which compounds can be tested for activity

• Combinatorial synthesis allows parallel synthesis of large numbers of compounds

• Have these increased numbers resulted in more hits?

Page 4: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

The Combinatorial Explosion• The Available Chemicals Directory (www.mdli.com)

contains the following:– 85000 carboxylic acids– 44000 primary alkyl amines– 12000 aldehydes– 2000 fmoc amino acids

• So we could make…– 3.7×109 monoamides (acid + amine)– 5.3×108 secondary amines (reductive amination)– 7.5×1012 diamides (fmoc amino acid + amine + acid)

A.R. Leach GSK

Page 5: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

10120 virtual compounds available via combinatorial chemistry: reagents + chemistry → libraries

The Chemical Universe

Number of seconds since big bang = ca. 1017

WDI100K

Corporate Database

1m

Compound Suppliers

10m

Chemical Abstracts35m

Page 6: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Increasing Throughput is not Enough!

• Chemical space is huge!

• Success rates of large libraries have been low– Insoluble; high molecular weight; too flexible; etc; etc– Costs are high: $1 per compound = $1 million for library of 106

compounds

• Assay does not always permit HTS

• Despite initial enthusiasm for large numbers the current trend is towards smaller carefully designed libraries

Page 7: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Virtual Screening• Virtual screening

– In-silico prioritisation of compounds

• Virtual screening can be used to – Select compounds for screening from in-house databases– Choose compounds to purchase from external suppliers – Design combinatorial libraries

• The technique applied depends on the aim and on the knowledge available, for example, about the particular disease target– Usually there are multiple criteria to consider

Page 8: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Virtual Screening: Focused Libraries

• Targeted/focused libraries– Selection of compounds that are similar to a lead compound or

that fit a QSAR – Selection of compounds focused on a single therapeutic target

using structure-based drug design– Selection of compounds focused on a family of related targets

• Even with focused libraries still need to have some diversity

Page 9: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Virtual Screening: Diversity• Lead generation

– Selection of compounds when little is known about a particular target

– Selection of compounds for screening against several targets

• Compound acquisition – Selection of compounds to purchase to augment an existing

collection

Page 10: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Similarity and Diversity

• Similarity– If we have a known active

(literature, competitor compound etc) then compounds that are similar to it are likely to show similar activity

Property P1

Pro

perty

P2

• Similar Property Principle– Structurally similar compounds tend to exhibit similar properties

Similarity is the property of a pair of compoundsDiversity is the property of a library of compounds

Property P1

Pro

perty

P2

• Diversity– A diverse subset of compounds

should maximise the coverage of biological activity and minimiseredundancy

Page 11: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Measuring the Diversity of a Compound Library: 1

• Calculating (dis)similarities– Dissimilarity = 1 – Similarity– Euclidean distance

• Requires molecular descriptors and similarity coefficient– Whole molecule properties; 2D fingerprints;

3D pharmacophores– Tanimoto coefficient

• Example diversity measures– Sum of pairwise (dis)similarities/distances– Average NN dissimilarity/distance

Des

crip

tor d

2

Descriptor d1

Page 12: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Measuring the Diversity of a Compound Library: 2

• Coverage of a pre-defined chemistry-space

• Requires definition of a low (1D to 6D) dimensional chemistry-space– Physicochemical properties – BCUTs– pharmacophore coverage– scaffold coverage (number of unique

scaffolds)Descriptor A

Des

crip

tor B

Page 13: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Selecting a Diverse Subset of Compounds

• Selecting a subset of size n from dataset of size Nrequires evaluation of subsets

There are ~2x1013 ways of selecting 10 cmpds from 100!

• Computationally efficient methods are required– Distance-based methods

• Dissimilarity-based compound selection; sphere exclusion; clustering

– Coverage-methods• Partitioning-schemes; optimisation-methods (maximise or minimise

diversity measure)

)!(!!

nNnN−

Page 14: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

DBCS: General Algorithm1. Select a compound and place in subset (random, centroid, most

diverse)2. Calculate dissimilarity between each remaining compound and

compounds in subset3. Choose next compound that is most dissimilar to compounds in subset

(MaxMin, MaxSum)4. If less than n compounds in subset, return to 2

Des

crip

tor d

2

1

23

4

5

Descriptor d1

• Characteristics– fast enough to be applied to large

dataset – based on (dis)similarities therefore

can be used with high dimensionality data e.g.fingerprints

– have a tendency to select outliers

Page 15: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Clustering• Group molecules

– molecules within a cluster are similar– molecules from different clusters are

dissimilar

• Choose one or more from each cluster

• Characteristics– good for high dimensionality data

(fingerprints) – reveals natural clustering in dataset– limited to small(ish) datasets– difficult to add new compounds

Page 16: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Sphere Exclusion1. Define a threshold similarity t2. Select a compound and place in subset 3. Remove all compounds with dissimilarity < t4. If compounds left in data set, return to 2

Pro

perty

P2

Property P1

t

Page 17: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

MW

logP

0 250 500 >750-3

0

3

>7

• Characteristics– fast but restricted to low dimensional

descriptors – diversity voids are easily identified– easy to add new compounds – cell boundaries are arbitrary

Partitioning/Cell-Based• Define a low dimensional space

– e.g. physicochemical properties (logP, MW,…); BCUT descriptors

– assign each compound to a cell– choose one or more from each cell

Page 18: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

NH2

A Diverse Set of Compounds!

OH

O

NHO

O

O

NH

OH

O OH

S

NH2

ONH

O

NH

S

O

NH

S

OOH

O

O

NH

O

O

O

O

OH

OH

O

NH

O

S NH

ONH

NHO

O

O

NH

NH

O

O

O

OH

NH

ONH2

SS

NH2 O

NH O

OH

O

OH

A.R. Leach GSK

Page 19: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Drug-likeness• The early approaches to designing HTS experiments

based on large diverse sets of compounds gave disappointing hit rates– low numbers of hits– hits unattractive for lead optimisation:

• Poor ADME properties - insoluble; lipophilic; too flexible; high molecular weight

• Whether designing diverse or focused libraries molecules should also be constrained to have "drug-like" physicochemical properties

Page 20: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Computational Filters• Set of computational

techniques to eliminate molecules that have inappropriate characteristics

• Reduce the number of compounds that need to perform calculations on

• “Badlist” of reactive or toxic substructures (cf “goodlist” of “privileged substructures”)

reactive or toxic fgs

molecular weight

lipophilicity

flexibility

Page 21: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Drug-like and Lead-like• Drug-likeness: eliminate compounds with non-drug-like

physicochemical properties– Lipinski “Rule Of Five”, in which a molecule is assumed unlikely

to be orally absorbed if at least two of the following conditions are met

• MW>500, ClogP>5, HBD>5, HBA>10

• Lead-likeness– “leads” tend to be smaller and less complex than “drugs”

Page 22: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Combinatorial Library Design

Page 23: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Building Block Selection• Compromise: Select reagents for the “real” library that

satisfy the layout but whose products perform well in the virtual screen

A1 B1 C1 D1 E1 F1

A2

B2

C2

D2

E2

F2

A1 B1 C1 D1

A2

C2

E2

F2

Virtual Library

“Actual” Library

MonomerSelection

C1 D1 E1 F1A2B2D2F2

Alternative Library -fewer “virtual hits”

A.R. Leach GSK

Page 24: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Combinatorial chemistry imposes an important constraint on building block selection

N

MCherry pick forbest subset

Select m × n for bestcombinatorial subset

There are 4 × 1054 ways to select a 36 × 36 library from 100 × 100 possible building blocks (NCn. MCm)

The optimal library cannot usually be derivedby considering the reagents alone: product-based library designGillet et al J. Chem. Inf. Comput. Sci. (1997), 37(4), 731-740

A.R. Leach GSK

Page 25: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Combinatorial Library Design• Library design is a multi-objective optimisation problem

– Diverse or focused or both – Cheap, small, combinatorially efficient – Drug-like, good ADME properties,

• Many in-silico methods exist for calculating the various properties

• Applying computational filters sequentially can lead to sub-optimal designs

• How do we find a good balance in the objectives?

Page 26: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Weighted-Sum Approach

• Limitations– Setting of weights is difficult

especially for different types of objectives

– The objectives are often in competition

– A single compromise solution is found when usually a family of alternative solutions exist that are all equivalent (trade-offs)

MWwdiversitywnf Δ+= ..)( 21

0.57

0.58

0.59

0.58 0.6 0.62

ΔMW

0.64

Div

ersi

tyw1=10; w2=1.0

R1 NH

H+ R2

O

OH R2

O

NH

R1

......cos..)( ++++= 21 4321 propertywpropertywtwdiversitywnf

w1=1.0; w2=1.0

w1=1.0; w2=0.5

Page 27: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Multiobjective Optimisation using a MOEA

• Multiple objectives and handled independently • Pareto optimality is used to explore the search space• Multiple equivalent solutions are explored in parallel

(exploiting the population nature of an EA)• MOEA for combinatorial library design

– Combinatorial subsets selected from virtual library of possible products

– The objectives can include any property that can be calculated for a library of compounds,

• Chemical properties: e.g. diversity; drug-like profiles; in-silico ADME properties

• Physical properties: size, configuration, number of subsets, cost

Page 28: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Pareto Ranking

• Each objective is optimised independently

• One solution dominates another if it is better in both objectives

• Solutions are ranked according to dominance value

• Solutions where no other solutions are greater in all objectives are non-dominated and form the Pareto frontier

f2

f1best

42

1

0

0

0

00

A

B

C

best

Page 29: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Combinatorial Library Design:Focused Libraries

• 100 × 100 virtual library– 104 potential products

• MOEA used to design 10 × 10 subsets

• Objectives– Similarity to a target

•Sum of similarities using Daylight fingerprints

– Predicted bioavailability•Each compound rated from 1 to 4•Sum of ratings

– Hydrogen bond donor profile– Rotatable bond profile

R1 NH

H+ R2

O

OH R2

O

NH

R1

00.10.20.30.40.50.60.7

1-SIM 1-BIO ΔHBD ΔRB

Page 30: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Aminothiazole Library

Combinatorial Library Design:Diverse Libraries

400

0

100

200

300

0 1000 2000Library Size

Occ

upie

dC

ells

MOGA

1395 products

364 cells

Virtual library of 12850 products170 thioureas × 74 α-bromoketones

Occupies 364 of 1134 cells (Cerius2 topological and physicochemical descriptors followed by PCA)

R1 N NH2

S

R2

Br

R3

O

R4 N

SN

R1

R2

R4

R3

+

Exploring the trade-off in diversity and library size

Page 31: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Incorporating Constraints• Practical limits are often imposed on library design so

that restricted regions of the search space are of interest• Constraints can be applied to restrict the search to

pertinent regions– Library size– Combinatorial efficiency

• Number of reactants required to generate given number of products• 20×20; 40×10; 80×5; etc

– Plate coverage

Page 32: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Combinatorial Efficiency

Constrained

15

20

25

30

15 20 25 30Number Thioureas

Num

ber α

-bro

mok

eton

es

Unconstrained

Size constraint: 400 to 600 productsCombinatorial efficiency constraint: 20 <= α-bromoketones >= 25

20 <= thioureas >= 25

0

100

200

300

400

200 300 400 500 600 700 800 900Library Size

Num

ber o

f Occ

upie

d C

ells

UnconstrainedConstrained

Page 33: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Selecting Multiple Combinatorial Subsets

A1 A2 A3 A4 A5 A6 A1 A2 A3 A4 A5 A6

B1 B1

B2 B2

B3 B3

B4 B4

B5 B5

B6 B6

B7 B7

B8 B8

B9 B9

B10 B10

24 products constructed from one 4 × 6 subset

24 products constructed fromtwo subsets: 2 × 6 and 4 × 3

Page 34: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Design of Test Cases• A cost value of 1 is assigned to various subsets of

compounds in the virtual library, all other compounds assigned a cost of 0

• Design criteria– Identify libraries with maximum sum of cost and minimum size– Ideal solutions: sum of cost = library size

0

100

200

300

400

0 100 200 300 400Library Size

Cost

Page 35: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Test Case 1

0

100

200

0 100 200Library Size

Cos

t

Subset 1 (R1 × R2) Size Subset 2 (R1 × R2) Size Total Size Cost

2 6 12 10 4 40 52100200

8 5 40 10 6 6052100

10 10 100 10 10 100 200

R2

R1

400 products assigned cost = 1

as two 10×20 subsets

Page 36: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Test Cases 2 and 3

0

100

200

300

0 100 200 300Library Size

Cos

t

R2

R1 Test Case 2

300 products in three 10×10 subsets

0200400600800

100012001400

0 200 400 600 800 1000 1200 1400

Library Size

Cos

t

R2

R1 Test Case 3

1190 products in one 17×70 subset

Page 37: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Aminothiazole Library

0

50

100

150

200

250

300

350

400

0 200 400 600 800 1000 1200 1400

Library Size

Num

ber o

f Occ

upie

d C

ells

# Subsets # Compounds for Max Cells (364)

1 1395

2 1056

4 938

R1 N NH2

S

R2

Br

R3

O

R4 N

SN

R1

R2

R4

R3

+

Page 38: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Using Pareto Ranking to Profile Compounds Synthesised in Lead

Optimisation

Property 1

Property 2

Property 3

Property 4

Property 5

Property 6

Property 7

Property 8

Property 9

Total Score

Pareto Rank

Page 39: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Profiles of High Scoring Compounds

Property 1

Property 2

Property 3

Property 4

Property 5

Property 6

Property 7

Property 8

Property 9

Total Score

Pareto Rank

Good solutions can be missed if property filters are applied sequentially

Page 40: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Reduced Graphs as Molecular Descriptors

Page 41: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Reduced Graphs• Emphasise functional over structural similarity

Morphine Codeine Heroin

Daylight similarity: 0.99 0.95

Methadone

0.20

• Applications– Identify compounds with similar activities but different skeletons– Cluster representation– Analysis of HTS to identify SAR

Page 42: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Fitness Evaluation

– Recall(R)=TA/(TA+FI)• Proportion of the total number of dataset actives retrieved

– Precision(P)=TA/(TA+FA)• Proportion of retrieved molecules that are actually active

– F-measure=2PR/(P+R)• Equally weighted “average” of precision and recall

A = Actives I = Inactives F = False T = True

• For each query: search through a dataset containing active and inactive molecules represented as RGs:

Page 43: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

hERG Dataset• IC50 data (5727 mols) for developability assay• Continuous to binary data conversion

– 3 activity cut-offs• low (>4.3) 1684 mols, med (>5) 937 mols, high (>6) 269 mols

– “Inactives” all those below activity cut-off!• E.g. Medium “inactives” incorporate 747 low “actives”

• Train, test and validation sets, with 5 EA runs

Low Activity Cut-off Query

Medium Activity Cut-off Query

High Activity Cut-off Query

= Aromatic +ve ionisable= Aliphatic +ve ionisable= Acyclic +ve ionisable

= Aromatic donor= Aliphatic donor= Acyclic donor

= Aromatic ring (no feature)= Aromatic donor/acceptor= Aromatic acceptor

Page 44: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Medium Activity Query Performance

ActiveInactive Active

Inactive

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Perc

enta

ge o

f Act

ual C

lass

Predicted Class Actual Class

Precision = 52% F-measure = 61%Recall = 75% Enrichment = 2.9

Page 45: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Evolving Multiple Queries• Combine results from two or more queries

– RG retrieved if found by query A or query B …– Potential increases in recall, precision and diversity

• Several different types of RG can exist within an activity class due to high structural variability or different binding modes

Active

Active

Inactive

Different queries

Page 46: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Evolving Multiple Queries• Queries must be complementary

– Uniqueness score•Higher uniqueness for queries retrieving actives found by few (or no) other queries

• Queries must be specific– To prevent large numbers of

false actives resulting from the combination of several queries

– Typically increased specificity necessitates lower recall• Pareto ranking used to evolve queries that maximise

recall, precision and uniqueness

aq

QUniquenessai

i⎟⎟⎠

⎞⎜⎜⎝

⎛= ∑

=

=0

1)(

U~0.9U=1

U~0.75U=0.33

Page 47: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Extracting Multiple SARs5HT1A agonists

Page 48: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Summary: Library DesignAmount of Focus and/or Diversity Needed

is Knowledge-Based

ProteinX-ray

Pharma-cophore

Known Ligands

ZeroKnowledge

Focused Diverse

Targ

et K

now

ledg

e

Structure-basedDesign

Pharmacophore-basedDesign

Focused Sets

1° LibrariesDiverse Sets

Need for diversity isrelated to the inverse of the information that you have

Still Want Some Diversity

Here!

A.R. Leach GSK

Page 49: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

Summary: II• There are more compounds available for testing than ever before

and therefore there is a great need to design screening sets andcombinatorial libraries carefully

• HTS means that obtaining information about the activities of compounds is easier than it has ever been but it is still time consuming and expensive if done on a very large scale

• Virtual screening and computational filters provide a variety of ways of prioritising compounds

• Pareto ranking provides a useful way of exploring trade-offs in different properties to be optimised

• A wide range of molecular descriptors have been devised for similarity searching and diversity analysis

• Careful selection of descriptors and method is required

Page 50: Diversity Analysis and Library Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gillet.pdf · Diversity Analysis and Library Design ... • The Available Chemicals Directory

References• Combinatorial Library Design

Gillet et al. J. Chem. Inf. Comput. Sci. 37, 1997, 731-740Gillet et al. Designing Focused Libraries Using MoSELECT. J. Mol. Graphics

Model. 20, 2002, 491-498.Gillet et al. Combinatorial Library Design Using a Multiobjective Genetic Algorithm.

J. Chem. Inf. Comp. Sci. 42, 2002, 375-385.Wright et al. J. Chem. Inf. Comp. Sci. 43, 2003, 381-390.

• Reduced GraphsGillet et al. Similarity Searching Using Reduced Graphs. J. Chem. Inf. Comp. Sci.

43, 2003, 338-345.Barker et al. Scaffold Hopping Using Clique Detection Applied to Reduced Graphs J.

Chem. Inf. Model. 46, 2006, 503-511.Gardiner et al. Cluster Representation Using Reduced Graphs. J. Chem. Inf. Model.

47, 2007, 354-366.Birchall et al. Evolving Interpretable Structure-Activity Relationship Models. Part 1:

Reduced Graph Queries J. Chem. Inf. Model In PressBirchall et al. Evolving Interpretable Structure-Activity Relationship Models. Part 2:

Using Multiobjective Optimisation to Derive Multiple Models J. Chem. Inf. Model In Press