employing potency data in computational trainable qsar model …€¦ · relies on stepwise...

1
0.0 0.5 1.0 1.5 2.0 log P o/w (R 3 ) Mechanistic Free-Wilson SAR Analysis of Potency Data Existing computational approaches may prove very useful for identification of analogs with optimized ADME and safety profiles. However, taking potency into consideration is impossible without a dedicated predictor for the particular target affinity endpoint. Since potency measurements are typically performed for combinatorial sets of compounds with a common scaffold and varying substituents, Free-Wilson-type SAR analysis 1 is the method of choice to produce such models. Notably, modeling should not focus on pursuing the best possible statistical performance. In lead optimization projects, straightforward mechanistic interpretation of the models is a much more powerful feature. It has been shown that physicochemical trends are evident even for complex protein-ligand interactions, for example in P-gp substrate specificity 2 or hERG channel inhibition. In this situation, an interpretable model that can successfully capture such tendencies could still be highly valuable even if it is not highly predictive, as it may help to direct research efforts towards more promising candidates. The current approach consists of two stages: 1. Modified Free-Wilson-type fragmentation is employed to produce a data matrix for statistical analysis. First, a common scaffold is identified among the molecules comprising the data set. The substituents are represented, however, by their contributions to major physicochemical properties rather than by particular structural fragments. The matrix is subsequently populated with R-group contributions to molecular size, logP, and hydrogen bonding potential. This helps minimize the number of variables in the model and ensures generalized mechanistic interpretation in terms of these properties, rather than simply pointing out the effects of introducing a specific substituent. 2. A class-specific model is built relating compounds’ potency to physicochemical characteristics of substituents using the Gradient Boosting Machine (GBM) statistical technique 3 . GBM methodology relies on stepwise optimization of an ensemble of weak predictors (decision trees) that may account for non-linear effects in the property variation. For this reason it is preferred over traditional additive approaches such as MLR or PLS. To avoid over-fitting 5-fold cross validation is performed, and only those model iterations where cross-validated R 2 is within 20% of training R 2 are considered. The proposed modeling methodology is planned for integration into the Structure Design software built on the ACD/Percepta platform 5 (interface shown in Figure 4). The software currently offers automated analog generation coupled with full-featured ADME/Tox profiling and ranking capabilities. Support for Auto-SAR analysis of data supplied by the user would complete the workflow outlined above, and would greatly enhance user experience by eliminating the need for separate tools to account for potency in computational lead optimization. Case Study: Cannabinoid CB2 Receptor Agonists In this section we present a real-world scenario illustrating how the described mechanistic Free-Wilson SAR analysis could be applied to model target affinity for a small class-specific data set, and what insight could be gained from the results obtained. FIGURE 3. Example compounds illustrating the significance of ‘local’ vs. whole-molecule physicochemical effects. Pranas Japertas, 1,2 Andrius Sazonovas, 1,2 Kiril Lanevskij, 1,2 Remigijus Didziapetris 1,2 1 Advanced Chemistry Development, Inc., 8 King Street East, Toronto, Ontario, M5C 1B5, Canada 2 VsI “Aukstieji algoritmai”, A.Mickeviciaus g. 29, LT-08117 Vilnius, Lithuania Employing Potency Data in Computational Lead Optimization by Automated Free-Wilson Analysis Introduction Lead optimization efforts are guided by a combination of factors, among which, the lead’s potency and its ADME/Tox properties play major roles. Each drug discovery project aims at optimizing activity against a specific target, however, computational models for the multitude of target affinity endpoints are not readily available. Consequently, conventional in silico lead optimization techniques can only be used for ADME/Tox profiling, while potency is neglected. In this work we present an Auto-SAR approach to overcome this issue by incorporating user-defined potency data in analog profiling. This approach is based on automatic Free- Wilson type SAR analysis on a series of known compounds with a common scaffold and varying substituents, to evaluate the influence of substituents in different positions on the considered property. The substituents are represented by their contributions to major physicochemical properties such as size, lipophilicity, ionization, and hydrogen bonding. Exploring physicochemical dependencies allows feasible, mechanistically interpretable class-specific SAR models to be obtained from small data sets (several tens of compounds with measured potency data). Modeling involves special statistical methods to capture the nonlinearity in the relationship between the dependent property and used descriptors. The obtained class- specific models can be used to gain a better understanding of substituent effects, evaluate target activities of new compounds of the same class, and guide lead optimization efforts to the most promising candidates. Finally, we present several case studies based on published lead optimization articles, where the structural analogs suggested by the software are compared to those proposed by authors of the original studies. FIGURE 4. Prototype of automated Free-Wilson analysis in the Structure Design user interface. SCHEME 2. A proposed workflow of in silico lead optimization involving ADME/Tox profiling combined with Auto-SAR utilizing available potency data. References 1. Kubinyi H. QSAR: Hansch Analysis and Related Approaches. Wiley VCH, 1993, 240 p. 2. Didziapetris R et al. J Drug Target. 2003,11, 391. 3. Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine, IMS 1999 Reitz Lecture. 4. van der Stelt M et al. J Med Chem. 2011, 54, 7350. 5. Structure Design on the ACD/Percepta platform www.acdlabs.com/leadop SCHEME 1. An outline of mechanistic Free-Wilson model development workflow. Application of Potency Data in a Lead Optimization Workflow A natural further step building upon the described concept would be to integrate potency data in the computational lead optimization pipeline and make it available for compound ranking along with ADME/Tox profiles. A small dataset of measured potency values for 20+ compounds with substituent alteration performed in at least two sites would suffice for automatic derivation of simple Free-Wilson type QSAR models describing the substituent physicochemical property contributions to the compound’ s overall potency. The resulting in silico lead optimization workflow would be as shown in Scheme 2. This approach would address the issue that candidates suggested by the software solely on the basis of their ADME/Tox profiles may fail potency requirements. N N N O O R 1 R 2 R 3 [+c] pEC 50 Substituents log P o/w Vx R 1 R 2 R 3 R 1 R 2 R 3 R 1 R 2 R 3 2.82 -Me -Cl -Cl +0.2 +0.6 +0.8 +0.14 +0.11 +0.11 3.20 -Me -Cl -OCH 3 +0.2 +0.6 0.2 +0.14 +0.11 +0.20 0.86 -Me -Br -CF 3 +0.2 +0.7 +0.9 +0.14 +0.17 +0.18 4.13 -Me -Br -OH +0.2 +0.7 0.4 +0.14 +0.17 +0.06 2.19 -Et -OH -CH 2 OH +0.5 0.4 0.5 +0.28 +0.06 +0.20 3.10 -Et -OH -Br +0.5 0.4 +0.8 +0.28 +0.06 +0.17 1.85 -Et -Ph -Ph +0.5 +0.3 +1.9 +0.28 +0.60 +0.60 2.93 -Et -Ph -CH 2 Ph +0.5 +0.3 +1.7 +0.28 +0.60 +0.74 N N N O O R 1 R 2 R 3 [+c] LogP Vx LogP Vx LogP Vx pEC50 Predicted Observed 1 2 + Data set with a common scaffold N N S N N R 2 R 1 Measure potency values for the compound series Perform automated Free-Wilson SAR analysis Transfer to chemical spreadsheet Generate analogs and rank according to ADME/Tox Profile and Auto-SAR model predictions Optimized lead N N S N N R 2 R 1 Exp Position log P o/w Vx R 1 R 2 R 1 R 2 R 1 R 2 5.5 -H -Cl 0 +0.6 0 +0.1 6.0 -Me -H +0.2 0 +0.2 0 5.9 -H -Br 0 +0.7 0 +0.2 6.2 -Me -OH +0.5 0.4 +0.2 +0.1 4.5 -H -H 0 0 0 0 6.5 -Me -Ph 4.4 4.9 5.4 5.9 4.4 4.9 5.4 5.9 Head Office: +1 416 368-3435 Email: [email protected] www.acdlabs.com Request a reprint of this poster N N N O O R 1 R 2 R 3 FIGURE 1. A common scaffold of the considered CB2 antagonists. The presented case study leads to several key conclusions: 1) GBM method used in modeling successfully describes non-linear effects in property variation. 2) Due to the position-specific nature of the explored dependences, the employed approach can not only show what property changes are necessary, but can also capture the local effects indicating where these changes have to be applied in order to achieve the desired effect. Authors of the original study draw significant attention to basic affinity-lipophilicity relationships, which is also a major focus point of our analysis. However, apart from lipophilicity (octanol/waterlogP) we also investigated the influence of ionization (pK a ), molecular size (McGowan Volume, V x ), and hydrogen bonding potential (Abraham’s A & B) of substituents by including their contributions to the respective properties of the molecules as descriptors in the GBM model. The key results of our physicochemical Free-Wilson analysis are briefly described below: pEC 50 for CB2 receptor: the most significant physicochemical determinant in the GBM model was lipophilicity of substituent in the R 1 -position. pEC 50 value quickly rises with increasing logP o/w , and ultimately reaches a plateau (Figure 2, A), while no such trend was evident at R 3 -position (not shown). pIC 50 for hERG inhibition: 1. The major determinant of hERG blocking propensity was basic pK a of R 3 -substituent, with a steady increase in pIC 50 consistent with the ionized fraction at physiological conditions (Figure 2, B). 2. logP o/w dependence at R 3 follows a distinct pattern with IC 50 increasing up to a certain “optimum” logP o/w and then rapidly falling with further increasing lipophilicity (Figure 2, C). 3. A much weaker logP o/w dependence was observed at R 1 indicating that this part of the molecule probably does not play a major role in hERG binding (not shown). 0.0 0.5 1.0 1.5 2.0 log P o/w (R 1 ) 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Base pKa (R 3 ) pIC50 (hERG) pEC50 (CB2) This test case is based on a recent J. Med. Chem. publication dealing with discovery of novel CB2 receptor agonists. 4 The core 1-(4-(Pyridin-2-yl)benzyl)-imidazolidine-2,4-dione scaffold with varying substituents introduced in 3 different positions is depicted in Figure 1. Here, R 2 was only alternated between hydrogen and fluorine atoms in order to modulate pK a of the basic amine in R 3 . Therefore, this position is out of the scope of the current study, and we only focus on positions R 1 and R 3 . Whole-molecule vs. ‘local’ property value approach An important aspect of this work is that ‘local’ position-specific analysis of physicochemical property dependences employed here may provide valuable insight that could not be obtained from whole molecule property values alone. Consider, for example, two molecules from the studied data set shown in Figure 3. They have similar lipophilicity but the former is significantly more potent and does not block hERG. In fact, compound 44 was selected as the best of the series. The reason is evident from our results discussed above: CB2 affinity is more sensitive to changes in substituent R 1 (cyclopropyl- better than methyl-), while making R 3 more hydrophilic has little effect on pEC 50 , but helps further attenuate hERG inhibition. FIGURE 2. Key physicochemical dependences observed in GBM model; higher affinities colored green in case of CB2, and red in case of hERG. (A) (B) (C) Compound 44 log D o/w = 1.0 pEC 50 (CB2) = 8.0 hERG Inhibition @ 100 μM ≈ 0% Compound 20 log D o/w = 1.3 pEC 50 (CB2) = 7.4 hERG Inhibition @ 100 μM = 64% Visit ACD/Labs at Booth# 1112 N N N O O F N S O O N N N N CH 3 O O

Upload: others

Post on 21-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Employing Potency Data in Computational Trainable QSAR Model …€¦ · relies on stepwise optimization of an ensemble of weak predictors (decision trees) that may account for non-linear

0.0 0.5 1.0 1.5 2.0

log Po/w (R3)

Mechanistic Free-Wilson SAR Analysis of Potency Data

Existing computational approaches may prove very useful for identification of analogs with optimized

ADME and safety profiles. However, taking potency into consideration is impossible without a dedicated

predictor for the particular target affinity endpoint. Since potency measurements are typically performed for

combinatorial sets of compounds with a common scaffold and varying substituents, Free-Wilson-type SAR

analysis1 is the method of choice to produce such models. Notably, modeling should not focus on pursuing

the best possible statistical performance. In lead optimization projects, straightforward mechanistic

interpretation of the models is a much more powerful feature. It has been shown that physicochemical

trends are evident even for complex protein-ligand interactions, for example in P-gp substrate specificity2

or hERG channel inhibition. In this situation, an interpretable model that can successfully capture such

tendencies could still be highly valuable even if it is not highly predictive, as it may help to direct research

efforts towards more promising candidates. The current approach consists of two stages:

1. Modified Free-Wilson-type fragmentation is employed to produce a data matrix for statistical analysis.

First, a common scaffold is identified among the molecules comprising the data set. The substituents

are represented, however, by their contributions to major physicochemical properties rather than by

particular structural fragments. The matrix is subsequently populated with R-group contributions to

molecular size, logP, and hydrogen bonding potential. This helps minimize the number of variables in

the model and ensures generalized mechanistic interpretation in terms of these properties, rather than

simply pointing out the effects of introducing a specific substituent.

2. A class-specific model is built relating compounds’ potency to physicochemical characteristics of

substituents using the Gradient Boosting Machine (GBM) statistical technique3. GBM methodology

relies on stepwise optimization of an ensemble of weak predictors (decision trees) that may account for

non-linear effects in the property variation. For this reason it is preferred over traditional additive

approaches such as MLR or PLS. To avoid over-fitting 5-fold cross validation is performed, and only

those model iterations where cross-validated R2 is within 20% of training R2 are considered.

The proposed modeling methodology is planned for integration into the Structure Design software built on the

ACD/Percepta platform5 (interface shown in Figure 4). The software currently offers automated analog

generation coupled with full-featured ADME/Tox profiling and ranking capabilities. Support for Auto-SAR

analysis of data supplied by the user would complete the workflow outlined above, and would greatly enhance

user experience by eliminating the need for separate tools to account for potency in computational lead

optimization.

Trainable QSAR Model of Plasma Protein Binding and its

Application for Predicting Volume of Distribution

Case Study: Cannabinoid CB2 Receptor Agonists

In this section we present a real-world scenario illustrating how the described mechanistic Free-Wilson

SAR analysis could be applied to model target affinity for a small class-specific data set, and what insight

could be gained from the results obtained.

FIGURE 3. Example compounds illustrating the significance of ‘local’ vs. whole-molecule physicochemical effects.

Pranas Japertas,1,2 Andrius Sazonovas,1,2

Kiril Lanevskij,1,2 Remigijus Didziapetris1,2

1 Advanced Chemistry Development, Inc., 8 King

Street East, Toronto, Ontario, M5C 1B5, Canada

2 VsI “Aukstieji algoritmai”, A.Mickeviciaus g. 29,

LT-08117 Vilnius, Lithuania

Employing Potency Data in Computational

Lead Optimization by Automated Free-Wilson

Analysis

Introduction

Lead optimization efforts are guided by a combination of factors, among which, the lead’s potency and its

ADME/Tox properties play major roles. Each drug discovery project aims at optimizing activity against a

specific target, however, computational models for the multitude of target affinity endpoints are not readily

available. Consequently, conventional in silico lead optimization techniques can only be used for ADME/Tox

profiling, while potency is neglected. In this work we present an Auto-SAR approach to overcome this issue

by incorporating user-defined potency data in analog profiling. This approach is based on automatic Free-

Wilson type SAR analysis on a series of known compounds with a common scaffold and varying

substituents, to evaluate the influence of substituents in different positions on the considered property. The

substituents are represented by their contributions to major physicochemical properties such as size,

lipophilicity, ionization, and hydrogen bonding. Exploring physicochemical dependencies allows feasible,

mechanistically interpretable class-specific SAR models to be obtained from small data sets (several tens of

compounds with measured potency data). Modeling involves special statistical methods to capture the

nonlinearity in the relationship between the dependent property and used descriptors. The obtained class-

specific models can be used to gain a better understanding of substituent effects, evaluate target activities

of new compounds of the same class, and guide lead optimization efforts to the most promising candidates.

Finally, we present several case studies based on published lead optimization articles, where the structural

analogs suggested by the software are compared to those proposed by authors of the original studies.

FIGURE 4. Prototype of automated Free-Wilson analysis in the Structure Design user interface.

SCHEME 2. A proposed workflow of in silico lead optimization involving ADME/Tox profiling combined with

Auto-SAR utilizing available potency data.

References

1. Kubinyi H. QSAR: Hansch Analysis and Related

Approaches. Wiley VCH, 1993, 240 p.

2. Didziapetris R et al. J Drug Target. 2003,11, 391.

3. Friedman JH. Greedy Function Approximation: A

Gradient Boosting Machine, IMS 1999 Reitz Lecture.

4. van der Stelt M et al. J Med Chem. 2011, 54, 7350.

5. Structure Design on the ACD/Percepta platform

www.acdlabs.com/leadopSCHEME 1. An outline of mechanistic Free-Wilson model development workflow.

Application of Potency Data in a Lead Optimization Workflow

A natural further step building upon the described concept would be to integrate potency data in the

computational lead optimization pipeline and make it available for compound ranking along with ADME/Tox

profiles. A small dataset of measured potency values for 20+ compounds with substituent alteration

performed in at least two sites would suffice for automatic derivation of simple Free-Wilson type QSAR

models describing the substituent physicochemical property contributions to the compound’s overall

potency. The resulting in silico lead optimization workflow would be as shown in Scheme 2. This approach

would address the issue that candidates suggested by the software solely on the basis of their ADME/Tox

profiles may fail potency requirements.

N

N

NO

O

R1

R2

R3 [+c]

pEC50 Substituents log Po/w Vx

R1 R2 R3 R1 R2 R3 R1 R2 R3

2.82 -Me -Cl -Cl +0.2 +0.6 +0.8 +0.14 +0.11 +0.11

3.20 -Me -Cl -OCH3 +0.2 +0.6 –0.2 +0.14 +0.11 +0.20

0.86 -Me -Br -CF3 +0.2 +0.7 +0.9 +0.14 +0.17 +0.18

4.13 -Me -Br -OH +0.2 +0.7 –0.4 +0.14 +0.17 +0.06

2.19 -Et -OH -CH2OH +0.5 –0.4 –0.5 +0.28 +0.06 +0.20

3.10 -Et -OH -Br +0.5 –0.4 +0.8 +0.28 +0.06 +0.17

1.85 -Et -Ph -Ph +0.5 +0.3 +1.9 +0.28 +0.60 +0.60

2.93 -Et -Ph -CH2Ph +0.5 +0.3 +1.7 +0.28 +0.60 +0.74

N

N

NO

O

R1

R2

R3 [+c]

LogP

Vx

LogP

Vx

LogP

Vx

pEC50

Predicted

Obs

erve

d

1

2

+

Data set with a

common scaffold

N N

S N

NR2

R1

Measure potency values

for the compound series

Perform automated

Free-Wilson SAR analysis

Transfer to chemical

spreadsheet

Generate analogs and rank according to

ADME/Tox Profile and Auto-SAR model

predictions

Optimized lead

N N

S N

NR2

R1

Exp Position log Po/w Vx

R1 R2 R1 R2 R1 R2

5.5 -H -Cl 0 +0.6 0 +0.1

6.0 -Me -H +0.2 0 +0.2 0

5.9 -H -Br 0 +0.7 0 +0.2

6.2 -Me -OH +0.5 –0.4 +0.2 +0.1

4.5 -H -H 0 0 0 0

6.5 -Me -Ph +0.5 +0.3 +0.2 +0.6

4.4

4.9

5.4

5.9

4.4 4.9 5.4 5.9

Head Office: +1 416 368-3435

Email: [email protected]

www.acdlabs.com

Request a reprint

of this poster

N

N

NO

O

R1

R2

R3

FIGURE 1. A common scaffold of the

considered CB2 antagonists.

The presented case study leads to several key conclusions:

1) GBM method used in modeling successfully describes non-linear effects in property variation.

2) Due to the position-specific nature of the explored dependences, the employed approach can not only

show what property changes are necessary, but can also capture the local effects indicating where

these changes have to be applied in order to achieve the desired effect.

Authors of the original study draw significant attention to basic affinity-lipophilicity relationships, which is also

a major focus point of our analysis. However, apart from lipophilicity (octanol/water—logP) we also

investigated the influence of ionization (pKa), molecular size (McGowan Volume, Vx), and hydrogen bonding

potential (Abraham’s A & B) of substituents by including their contributions to the respective properties of the

molecules as descriptors in the GBM model. The key results of our physicochemical Free-Wilson analysis

are briefly described below:

pEC50 for CB2 receptor: the most significant

physicochemical determinant in the GBM model was

lipophilicity of substituent in the R1-position. pEC50

value quickly rises with increasing logPo/w, and

ultimately reaches a plateau (Figure 2, A), while no

such trend was evident at R3-position (not shown).

pIC50 for hERG inhibition:

1. The major determinant of hERG blocking

propensity was basic pKa of R3-substituent, with

a steady increase in pIC50 consistent with the

ionized fraction at physiological conditions

(Figure 2, B).

2. logPo/w dependence at R3 follows a distinct

pattern with IC50 increasing up to a certain

“optimum” logPo/w and then rapidly falling with

further increasing lipophilicity (Figure 2, C).

3. A much weaker logPo/w dependence was

observed at R1 indicating that this part of the

molecule probably does not play a major role in

hERG binding (not shown).

0.0 0.5 1.0 1.5 2.0

log Po/w (R1)

3.0 4.0 5.0 6.0 7.0 8.0 9.0

Base pKa (R3)

pIC

50

(h

ERG

)p

EC5

0 (

CB

2)

This test case is based on a recent J. Med. Chem. publication

dealing with discovery of novel CB2 receptor agonists.4 The core

1-(4-(Pyridin-2-yl)benzyl)-imidazolidine-2,4-dione scaffold

with varying substituents introduced in 3 different positions is

depicted in Figure 1. Here, R2 was only alternated between

hydrogen and fluorine atoms in order to modulate pKa of the

basic amine in R3. Therefore, this position is out of the scope of

the current study, and we only focus on positions R1 and R3.

Whole-molecule vs. ‘local’ property value approach

An important aspect of this work is that ‘local’ position-specific analysis of physicochemical property

dependences employed here may provide valuable insight that could not be obtained from whole molecule

property values alone. Consider, for example, two molecules from the studied data set shown in Figure 3.

They have similar lipophilicity but the former is significantly more potent and does not block hERG. In fact,

compound 44 was selected as the best of the series. The reason is evident from our results discussed

above: CB2 affinity is more sensitive to changes in substituent R1 (cyclopropyl- better than methyl-), while

making R3 more hydrophilic has little effect on pEC50, but helps further attenuate hERG inhibition.

FIGURE 2. Key physicochemical dependences

observed in GBM model; higher affinities colored

green in case of CB2, and red in case of hERG.

(A)

(B)

(C)

Compound 44

log Do/w = 1.0

pEC50 (CB2) = 8.0

hERG Inhibition @ 100 μM ≈ 0%

Compound 20

log Do/w = 1.3

pEC50 (CB2) = 7.4

hERG Inhibition @ 100 μM = 64%

Visit ACD/Labs at Booth# 1112

N

N

NO

O

F

N

SO O

N

N N

N

CH3O

O