herg blocking potential of acids and zwitterions characterized by three thresholds for acidity, size...

10
hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity Nikolai G. Nikolov , Marianne Dybdahl, Svava Ó. Jónsdóttir, Eva B. Wedebye Division of Toxicology and Risk Assessment, National Food Institute, Technical University of Denmark, Mørkhøj Bygade 26, 2860 Søborg, Denmark article info Article history: Received 8 June 2014 Revised 26 August 2014 Accepted 5 September 2014 Available online xxxx Keywords: hERG Ionization Acid Zwitterion Decision tree QSAR abstract Ionization is a key factor in hERG K + channel blocking, and acids and zwitterions are known to be less probable hERG blockers than bases and neutral compounds. However, a considerable number of acidic compounds block hERG, and the physico-chemical attributes which discriminate acidic blockers from acidic non-blockers have not been fully elucidated. We propose a rule for prediction of hERG blocking by acids and zwitterionic ampholytes based on thresholds for only three descriptors related to acidity, size and reactivity. The training set of 153 acids and zwitterionic ampholytes was predicted with a con- cordance of 91% by a decision tree based on the rule. Two external validations were performed with sets of 35 and 48 observations, respectively, both showing concordances of 91%. In addition, a global QSAR model of hERG blocking was constructed based on a large diverse training set of 1374 chemicals covering all ionization classes, externally validated showing high predictivity and compared to the decision tree. The decision tree was found to be superior for the acids and zwitterionic ampholytes classes. Ó 2014 Elsevier Ltd. All rights reserved. 1. Introduction Potassium ion channels are structurally and functionally diverse families of potassium selective channel proteins playing a central role in the electrical activity of excitable cells. 1 They have central importance in regulating a number of key cell functions for example in the brain, heart, pancreas, prostate, kidney, gastro-intestinal tract, small intestine and peripheral blood leuko- cytes, placenta, lungs, spleen, colon, thymus, testes and ovaries, epithelia and inner ear organs. Humans have over 70 genes encod- ing potassium channel subtypes 2 with a great diversity with regard to both structure and function. The human ether-a-go-go-related gene (hERG) encodes the pore forming alpha subunit of the hERG potassium ion channel (also called Kv11.1 or KCHN2) 3 which plays a crucial role in repolariza- tion of the heart and mediates the repolarizing I KR current in the cardiac action potential. Inhibition or blocking of the channel is associated with QT interval prolongation (long QT syndrome) which in turn may cause torsades de pointes (TdP), a potentially fatal arrhythmia. 4,5 Although hERG blocking does not necessarily imply arrhythmogenic potential (some drugs can prolong the QT interval without causing TdP, e.g., phenobarbital, ranolazine, alfuz- osin and verapamil), it is one of the most important markers for cardiac risk. Besides some anti-arrhythmic drugs whose target is hERG, there are a large number of non-cardiac drugs causing block- ing of KCHN2 and potentially sudden death. 6 Therefore hERG blocking is an important property, that is in most cases undesirable in drug candidates, and the need for assessment of QT prolongation liability of drugs under development is recognized in topic E14 of the International Conference on Harmonization in 2005. However, hERG is a promiscuous antitarget which has been shown to inter- act with chemicals of highly varied structure. Blockade of hERG has been extensively investigated in the recent decade, including mechanistic studies, a number of in silico approaches (reviews are available e.g., in 7–10 ), and new in vitro assays proposed as alternatives to the traditional cost-intensive patch-clamp method. In silico models for hERG inhibition can assist the elimination of possibly cardiotoxic drug candidates at an early stage in drug design. As the crystal structure of the hERG channel is not yet known, most in silico models are ligand-based http://dx.doi.org/10.1016/j.bmc.2014.09.007 0968-0896/Ó 2014 Elsevier Ltd. All rights reserved. Abbreviations: AZA, acids and zwitterionic ampholytes; D E , donor (electrophilic) superdelocalizability (a.u./eV); D E MaxN , the maximum of D E for all nitrogen atoms in a conformer (a.u./eV); DiamEff, conformer effective cross-sectional diameter, (Å); hERG, human ether-a-go-go-related gene; Max D E MaxN , the maximum of D E MaxN for all generated conformers of a structure, (a.u./eV); MaxDiamEff, the maximum of DiamEff for all generated conformers of a structure, (Å); MOPAC, molecular orbital package; QSAR, quantitative structure–activity relationship; T GLOBAL , the training set of 1374 chemicals of diverse ionization classes; T AZA , the AZA from T GLOBAL (153 chemicals); V 1 , validation set 1 (344 chemicals); V 1AZA , the AZA from V 1 (35 chemicals); V 2 , validation set 2 (225 chemicals, negative threshold = 10 lM); V 2AZA , the AZA from V 2 (48 chemicals). Corresponding author. E-mail address: [email protected] (N.G. Nikolov). Bioorganic & Medicinal Chemistry xxx (2014) xxx–xxx Contents lists available at ScienceDirect Bioorganic & Medicinal Chemistry journal homepage: www.elsevier.com/locate/bmc Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Chem. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Upload: eva-b

Post on 09-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

Bioorganic & Medicinal Chemistry xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Bioorganic & Medicinal Chemistry

journal homepage: www.elsevier .com/locate /bmc

hERG blocking potential of acids and zwitterions characterizedby three thresholds for acidity, size and reactivity

http://dx.doi.org/10.1016/j.bmc.2014.09.0070968-0896/� 2014 Elsevier Ltd. All rights reserved.

Abbreviations: AZA, acids and zwitterionic ampholytes; DE, donor (electrophilic)superdelocalizability (a.u./eV); DE

MaxN, the maximum of DE for all nitrogen atoms ina conformer (a.u./eV); DiamEff, conformer effective cross-sectional diameter, (Å);hERG, human ether-a-go-go-related gene; Max DE

MaxN, the maximum of DEMaxN for

all generated conformers of a structure, (a.u./eV); MaxDiamEff, the maximum ofDiamEff for all generated conformers of a structure, (Å); MOPAC, molecular orbitalpackage; QSAR, quantitative structure–activity relationship; TGLOBAL, the trainingset of 1374 chemicals of diverse ionization classes; TAZA, the AZA from TGLOBAL (153chemicals); V1, validation set 1 (344 chemicals); V1AZA, the AZA from V1 (35chemicals); V2, validation set 2 (225 chemicals, negative threshold = 10 lM); V2AZA,the AZA from V2 (48 chemicals).⇑ Corresponding author.

E-mail address: [email protected] (N.G. Nikolov).

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Chem. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Nikolai G. Nikolov ⇑, Marianne Dybdahl, Svava Ó. Jónsdóttir, Eva B. WedebyeDivision of Toxicology and Risk Assessment, National Food Institute, Technical University of Denmark, Mørkhøj Bygade 26, 2860 Søborg, Denmark

a r t i c l e i n f o

Article history:Received 8 June 2014Revised 26 August 2014Accepted 5 September 2014Available online xxxx

Keywords:hERGIonizationAcidZwitterionDecision treeQSAR

a b s t r a c t

Ionization is a key factor in hERG K+ channel blocking, and acids and zwitterions are known to be lessprobable hERG blockers than bases and neutral compounds. However, a considerable number of acidiccompounds block hERG, and the physico-chemical attributes which discriminate acidic blockers fromacidic non-blockers have not been fully elucidated. We propose a rule for prediction of hERG blockingby acids and zwitterionic ampholytes based on thresholds for only three descriptors related to acidity,size and reactivity. The training set of 153 acids and zwitterionic ampholytes was predicted with a con-cordance of 91% by a decision tree based on the rule. Two external validations were performed with setsof 35 and 48 observations, respectively, both showing concordances of 91%. In addition, a global QSARmodel of hERG blocking was constructed based on a large diverse training set of 1374 chemicals coveringall ionization classes, externally validated showing high predictivity and compared to the decision tree.The decision tree was found to be superior for the acids and zwitterionic ampholytes classes.

� 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Potassium ion channels are structurally and functionallydiverse families of potassium selective channel proteins playing acentral role in the electrical activity of excitable cells.1 They havecentral importance in regulating a number of key cell functionsfor example in the brain, heart, pancreas, prostate, kidney,gastro-intestinal tract, small intestine and peripheral blood leuko-cytes, placenta, lungs, spleen, colon, thymus, testes and ovaries,epithelia and inner ear organs. Humans have over 70 genes encod-ing potassium channel subtypes2 with a great diversity with regardto both structure and function.

The human ether-a-go-go-related gene (hERG) encodes the poreforming alpha subunit of the hERG potassium ion channel (also

called Kv11.1 or KCHN2)3 which plays a crucial role in repolariza-tion of the heart and mediates the repolarizing IKR current in thecardiac action potential. Inhibition or blocking of the channel isassociated with QT interval prolongation (long QT syndrome)which in turn may cause torsades de pointes (TdP), a potentiallyfatal arrhythmia.4,5 Although hERG blocking does not necessarilyimply arrhythmogenic potential (some drugs can prolong the QTinterval without causing TdP, e.g., phenobarbital, ranolazine, alfuz-osin and verapamil), it is one of the most important markers forcardiac risk. Besides some anti-arrhythmic drugs whose target ishERG, there are a large number of non-cardiac drugs causing block-ing of KCHN2 and potentially sudden death.6 Therefore hERGblocking is an important property, that is in most cases undesirablein drug candidates, and the need for assessment of QT prolongationliability of drugs under development is recognized in topic E14 ofthe International Conference on Harmonization in 2005. However,hERG is a promiscuous antitarget which has been shown to inter-act with chemicals of highly varied structure.

Blockade of hERG has been extensively investigated in therecent decade, including mechanistic studies, a number of in silicoapproaches (reviews are available e.g., in7–10), and new in vitroassays proposed as alternatives to the traditional cost-intensivepatch-clamp method. In silico models for hERG inhibition canassist the elimination of possibly cardiotoxic drug candidates atan early stage in drug design. As the crystal structure of the hERGchannel is not yet known, most in silico models are ligand-based

Page 2: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

2 N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx

traditional SAR and QSAR models using 2D and 3D moleculardescriptors and fragments, although some studies use homologymodeling based on bacterial potassium channels.11

Different ionization classes have been shown to exhibit sub-stantial differences in hERG blocking properties.12 In particular, ithas been reported that hERG affinity of acids and zwitterions isgenerally less than that of either bases or neutrals.12 It has beenfound that the addition of an acidic ionogenic group to a potentialdrug candidate could reduce hERG binding. The best known exam-ple of reducing hERG affinity by converting a basic compound to azwitterion is terfenadine, withdrawn from market in 1997 due toQT prolongation risk and fexofenadine, an antihistamine with nosignificant hERG activity;13 the structures differ only in the pres-ence of a carboxylate group in fexofenadine. Therefore acidic andzwitterionic compounds can be particularly important in avoidingcardiotoxicity and hERG binding. Still, a considerable number ofacidic compounds have been shown to block hERG, and the phys-ico-chemical attributes which discriminate acidic blockers fromacidic non-blockers have not been described in detail. Thus, thereis a specific need for prediction and analysis of hERG blocking inacidic and zwitterionic chemicals.

This work proposes a characterization of the hERG blocking byacids and zwitterionic ampholytes. We have identified threedescriptors related to acidity, size and reactivity and showed thatthese descriptors are able to predict hERG blocking for these clas-ses of chemicals in a simple and transparent decision tree withhigh predictive performance. Furthermore, a global QSAR modelof hERG blocking based on a large diverse training set coveringall ionization classes was developed and compared to the decisiontree.

2. Materials and methods

This section describes the data preparation and the develop-ment and validation of the hERG decision tree and model.

2.1. Data preparation

We have compiled a data set of 1718 structurally diverse sub-stances with experimental data for hERG channel blocking IC50

from the literature14–17 and The Binding Database (http://www.bindingdb.org18). The large majority of those were electro-physiological patch-clamp tests on mammalian cell lines (HEK293, CHO, COS) and a small number of tests on Xenopus laevisoocytes; the latter were only included if mammalian cell data werenot available. As discussed in Ref. 14, although ideally the useddata should be from the same assay, it can be appropriate to collectdata from different assays in order to compose a larger and morediverse training set, especially when developing a binary (categor-ical) model, as is also the case in the present work. A small numberof chemicals in the training set were tested by competitiveradioligand binding: 18 positive chemicals from Ref. 17 based onreported correlation between IC50 estimated by patch-clampmethods and the Ki of the astemizole radioligand assay and 24chemicals from Ref. 19, based on a similar correlation for the dof-etilide assay.20

Based on threshold data used in the available literature sources,a threshold IC50 value of 10 lM was used as an upper limit for theactives and 40 lM as a lower limit for the inactives. These valueswere chosen in order to maximize the number of chemicals thatcould be used to train and validate a binary classification model,because many of the published test data did not include a precisevalue of IC50 but only with an upper or a lower one. Introducing anintermediate area between 10 and 40 lM was also beneficial to thequality of the data set in view of previously observed varying levelof inter-laboratory reproducibility of hERG tests (e.g., Ref. 14).

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Ch

In several cases, data from The binding Database18 were incor-rectly reproduced from the original publication. For example, testsfrom Ref. 21 had �pIC50 reported as IC50. Furthermore, in fourother cases,22–25 IC50 in lM was reported in The binding Databaseas IC50 in nM; the respective data points were corrected. The datafrom Ref. 18 referring to Ref. 21 were ignored as the latter publica-tion was also used in Ref. 14.

The structures were imported into an OASIS Database Manager1.7.3 database (http://oasis-lmc.org26), where canonical SMILEScodes were generated. Duplicate structures and stereoisomerswere identified using the concept of parent 2D structure. The par-ent 2D structure was taken to be the original 2D structure withoutany stereo information; for salts, the parent structure was thengenerated by removing the relevant counterions (some smallorganic ones assumed not to affect toxicity and all inorganic ions).For every set of two or more structures sharing the same parent 2Dstructure, if all structures from the set belonged to the same activ-ity class (either IC50 <10 lM or IC50 P40 lM), only one structurewas kept and the rest were removed from the data set. If the struc-tures from the set had different activity classes, the whole set wasremoved. Mixtures (structures having two or more disjoint organiccomponents in their parent 2D structure) and chemicals withinorganic parent 2D structure were removed. The structure usedin the subsequent work was the parent 2D structure with stereoinformation retained.

2.2. Training and validation sets

The resulting data set consisted of 1718 experimental observa-tions, of which 1215 were hERG blockers (IC50 <10 lM) and 503non-blockers (IC50 P40 lM). It was prior to any performance ofmodeling randomly split into a training set TGLOBAL (1374chemicals, or 80% of the data set) and a validation set V1 withthe remaining 20% (344 chemicals). The ratio of hERG blockers tohERG non-blockers was maintained in the random selection forboth the training and the validation sets.

Next, acid and base pKa constants were calculated using thedefault algorithm in ACD Labs ACD/ToxSuite 2.95 (http://www.acdlabs.com/products/admet/tox/) as well as the other avail-able algorithm in the same system, marked as pKa ACD/Labs. Themacrodissociation constants were predicted for standardconditions (25 �C and zero ionic strength) in aqueous solutionsby a proprietary algorithm that uses microconstants predictionsat the corresponding protonation sites. The algorithm is based onan internal training set of 17,593 compounds (http://www.acdlabs.com/products/admet/tox/). For every structure, the values of thepKa calculated by the default algorithm were compared to thepKa calculated by the alternative algorithm in ACD/ToxSuite 2.95,pKa ACD/Labs; in case both algorithms found an acidic ionogenicgroup, the difference between the pKa values according to thetwo versions was required not to exceed 8, otherwise the pKa valuewas not used (these cases were very rare and were interpreted asartifacts of the algorithms). Using the calculated constants, theacids and zwitterionic ampholytes (AZA) in the data set were iden-tified (see also Appendix A for a definition of zwitterionicampholytes).

A subset of chemicals TAZA was selected from the training setTGLOBAL. The subset TAZA consisted of all chemicals from TGLOBAL

with at least one acidic ionogenic group and either no basic iono-genic groups at all or pKa(acidic) <pKa(basic) (i.e., the AZA part ofTGLOBAL). TAZA included 153 experimental data points, of which35 were hERG blockers (IC50 <10 lM) and 118 were non-blockers(IC50 P40 lM). Similarly, a validation set V1AZA was defined con-sisting of the acids and zwitterionic ampholytes of the set V1 (35experimental data points, of which 8 hERG blockers and 27 non-blockers).

em. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Page 3: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx 3

A second validation set was compiled from the training chemi-cals of the predictive model for hERG blocking included in ACD/Labs ACD/ToxSuite 2.95 (http://www.acdlabs.com/products/admet/tox/27). Salts and mixtures were identified and contradic-tory experimental results and duplicates were removed in thesame way as for the main data set. The parent structure for eachof these chemicals was compared to all parent structures in TGLOBAL

and V1; if any match was found, the structure was ignored. A set V2

was thus constructed, having no structures in common with eitherTGLOBAL or V1; moreover, no structures from the latter two setswere stereoisomers of, or salts of the same parent structure asany of the structures in V2. The set V2 contained 225 chemicals,112 of them actives (hERG IC50 <10 lM) and 113 inactives (hERGIC50 P10 lM; note the different inactivity threshold compared tothe training and the first validation sets which is due to the factthat no IC50 values were available for the set V2 but just informa-tion about the IC50 being over or under 10 lM). A subset of chem-icals V2AZA was selected from the validation set V2. The subsetV2AZA consisted of the acids and zwitterionic ampholytes of theset V2: 48 chemicals, of which 11 with hERG IC50 <10 lM and 37with hERG IC50 P10 lM.

Table 1 presents an overview of the data sets used in this study.

2.3. Generation of conformers

The three-dimensional structure of a ligand is commonly usedfor modelling of binding or biological activity since three-dimen-sional structure often holds important information regarding theproperties of the modelled interaction. The three-dimensional con-formation of most molecules varies in aqueous solution and oftenthere are multiple stable conformers of the same ligand, that is, con-formers with free energies close to that of the conformer with thelowest energy. This means that multiple conformers can potentiallybe energetically favourable for binding.28 For a binding to occur thegiven ligand has to fit into the cavity of the protein target and form afavourable interaction with the amino acid residues therein.

All structures from the data set were processed with the GASalgorithm29 and as a result, a set of conformers was generated foreach 2D structure. The GAS algorithm is a method for coverage ofthe conformational space of highly flexible chemicals by a limitednumber of conformers. The algorithm employs a genetic algorithmto minimize 3D similarity among the generated conformers and isbased on maximization of the root mean square distance betweenconformers together with a measure of evenness of conformer dis-tribution across conformational space. A procedure is included forautomated determination of the number of conformers needed foran appropriate coverage of conformational space.29 Geometry opti-mization of conformers is further completed by a semiempiricalquantum-chemical method. MOPAC 9330,31 is employed by makinguse of the AM1 Hamiltonian.32,33 The conformers are then screenedto eliminate those whose enthalpy of formation is greater than theenthalpy of formation of the conformer with absolute energy min-imum by more than a specified threshold (we used the default valuein OASIS Database Manager equal to 20 kcal/mol).

The set of conformers generated for a given 2D structure withthe GAS algorithm can be used as an approximation of the entire

Table 1Overview of the data sets

Data set Blockers Non-blockers Total Non-blockers cutoff (lM)

TGLOBAL 971 403 1374 40TAZA 35 118 153 40V1 244 100 344 40V1AZA 8 27 35 40V2 112 113 225 10V2AZA 11 37 48 10

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Che

conformational variety of the structure and used to formulateand test hypotheses about it.29

2.4. Descriptor calculation

We used two groups of descriptors: initial descriptors calculatedby ready-made software as well as derived descriptors calculatedby us on the basis of the initial ones.

2.4.1. Initial descriptorsStructural descriptors (describing the overall structure and not a

specific conformation) included lipophilicity, a topological index(InfoWiener), counts of the numbers of atoms, bonds and rings,as well as acidic and basic pKa. Conformational descriptors (describ-ing a specific conformer) included volume and surface descriptors,frontier molecular orbitals energies, geometric indices such aseffective cross-sectional diameter, maximum diameter, planarityindex, polarizability, electronegativity, heat of formation, geomet-ric topological indices etc. Atomic descriptors (calculated for eachatom of each conformer) included donor and acceptor superdelo-calizabilities, atomic self-polarizability, atomic charge and others.The list of initial descriptors with their names as defined by OASISDatabase Manager is shown in Appendix B. ACD/ToxBoxes 2.95 byACD/Labs, add-in for pKa and ion fractions 2.95 was used to calcu-late acidic and basic dissociation constants. All other descriptorswere calculated using OASIS Database Manager 1.7.3 (http://oasis-lmc.org26).

2.4.2. Derived descriptorsWe derived generalized structural descriptors from some of the

atomic and conformational descriptors. In particular, we used theset of conformers to study maximum and minimum values of con-formational parameters.

The following procedure was performed for all atomic descrip-tors from Appendix B. Let d be any atomic descriptor (e.g., charge,donor superdelocalizability etc.). Taking the maximum of theatomic descriptor d on all atoms of a given conformer, we defineda conformational (non-atomic) descriptor

dMax ¼max dðAÞ

where the maximum was taken on all atoms A of the conformer.As a result, from all atomic descriptors, for example, Q (atomic

charge) and DE (donor superdelocalizability�) we calculated confor-mational descriptors, for example, QMax (the maximum of the atomiccharge on all atoms of a given conformer), DE

Max (the maximum of thedonor superdelocalizability on all atoms of a given conformer), etc.

The procedure was repeated taking the maxima only on specificatoms (O, N, C). As a result, we calculated conformational descrip-tors QMaxO (the maximum of the atomic charge calculated on alloxygen atoms of a given conformer), DE

MaxN (the maximum of thedonor superdelocalizability calculated on all nitrogen atoms of agiven conformer), etc.

Furthermore, taking the maximum of a conformational descrip-tor d on all conformers of a given structure, we defined structuraldescriptors

Max d ¼max dðcÞ; Min d ¼min dðcÞ

where c ranges over all conformers of a given structure (approxi-mated by all generated conformers of the structure). This procedurewas carried out for all conformational parameters, both for theinitial ones from Table 1 and for the derived ones defined bycalculating maxima of atomic parameters. Note that the descriptorsderived through the second equation above are structural ones,

� Denoted by DONOR_DLC in the list of OASIS Database Manager descriptors(Appendix B). See also Appendix A for more information.

m. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Page 4: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

4 N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx

although they have been derived from conformational information(and possibly also from atomic information). Thus, we defined thestructural descriptors MaxDiamEff (the maximum of DiamEff, theeffective cross-sectional conformer diameter of a conformer, overall conformers), Max DE

MaxN and Min DEMaxN, the maximum and min-

imum (on all conformers) of the maximum donor superdelocalizabil-ity taken on all nitrogen atoms of a conformer, etc. The initialstructural descriptors together with the derived ones were used inthe development of the hERG decision tree.

2.5. Construction and validation of a decision tree for hERGactivity of acids and zwitterions

All 153 chemicals from TAZA were submitted to the See5decision tree system by RuleQuest Research (http://www.rulequest.com, see also Appendix A). In See5, a confidence valueof 5% was used in order to enhance the reliability of the deriveddecision tree.

In order to reduce randomness in the choice of descriptors, thedecision tree was required to have a sufficiently large minimumleaf size. A series of decision tree models was produced with differ-ent settings for this parameter. The number of actives in TAZA was35, therefore using more than 35 chemicals as the minimum leafsize resulted in no parameters being selected and the trivial classi-fier being built (classifying all structures as positive). Using pre-cisely 35 chemicals as the minimum leaf size resulted in twoparameters being selected—MaxDiamEff (the maximum effectivecross-sectional diameter) and Max DE

MaxN (the maximum donorsuperdelocalizability calculated at nitrogen atoms); see also Appen-dix A for information on these two descriptors. Exactly the sameresult was obtained when the minimum block size was set to anyvalue between 20 and 35. Using values of the minimum block sizeof less than 20 resulted in less stability of selection: different param-eters were selected at the different settings.

Next, ionization (acidic pKa), already used in the construction ofTAZA, was added manually to the two selected most significantdescriptors. The See5 system was used to generate thresholds forthe descriptors and the final positive rule was chosen based on itsperformance on the training set. The decision tree was defined asa combination of the rule and a definition of inconclusive predic-tions. The internal performance of the decision tree was estimatedon the training set of observations and after the model was finalized,the external performance was estimated on two independent exter-nal validation sets, V1AZA and V2AZA, defined in 2.2. The decision treeand the relevant results are presented in the Results section.

The performance of the decision tree on both the training andthe two validation sets was estimated by means of Cooper statis-tics. The statistics in all cases is calculated as follows. By a truepositive we mean a chemical, that is, both experimentally testedactive and predicted positive by a predictive model (e.g., a decisiontree or a QSAR model); similarly, a true negative will denote achemical, which is both experimentally tested inactive and pre-dicted negative by a predictive model. Sensitivity is the ratio oftrue positives to all with positive experimental data, specificity isthe ratio of true negatives to all with negative experimental dataand concordance is the ratio of true predictions to the total numberof structures predicted positive or negative. Coverage is defined asthe ratio of predictions falling in the applicability domain of thepredictive model to the total number of chemicals in the test setsubmitted to prediction.

2.6. Development of a QSAR model and comparison to thedecision tree

Using Leadscope Predictive Data Miner by Leadscope Inc. (Ref.34,35, http://www.leadscope.com; see also Appendix A), a global

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Ch

binary QSAR model of hERG IC50 <10 lM versus hERG IC50

P40 lM based on the complete training set TGLOBAL wasconstructed.

The molecular structures from training set TGLOBAL were con-verted into SD format using OASIS Database Manager 1.7.3(http://www.oasis-lmc.org26). The structures were then importedinto Leadscope Predictive Data Miner. A classification predictivemodel was created using partial logistic regression (PLR). Usingthe default mode recommended by Leadscope for the case ofunbalanced training sets, three separate sub-models were devel-oped based on three balanced training subsets, including the sameset of negatives and randomly selected disjoint subsets ofpositives. The sub-models were then combined into an overallcomposite model with the ‘assemble average model’ option inLeadscope, where the sub-models were assigned equal weights.

The applicability domain for the model required that acompound had at least 30% Tanimoto structural similarity with atraining set compound in at least one sub-model (a standard set-ting with Leadscope Predictive Data Miner). The Tanimoto similar-ity (initially proposed by Jaccard36) was calculated based onfingerprints of the Leadscope features used for each of the sub-models. In addition, predictions were required to have a positiveprediction probability of over 0.7 for positives and less than orequal to 0.3 for negatives, rendering predictions with probabilitiesbetween 0.3 and 0.7 out of the domain.

Each of the three sub-models has its own set of descriptors. Allsub-models use the following eight continuous descriptors: molec-ular weight, rotatable bonds, hydrogen bond acceptors, hydrogenbond donors, Lipinski score (a measure of the drug-likeness of amolecule), A logP, atom count and polar surface area. In addition,each sub-model uses a number (291 for sub-model 1, 370 forsub-model 2 and 389 for sub-model 3) of fragment descriptorswhere the presence of a fragment is represented by a value of 1and the absence of a fragment is represented by a value of 0. Thefragments are derived from the training set molecules and the onesover-represented in the active or the inactive class are selected asmodel descriptors (see Appendix A). When generating a prediction,the test molecule is similarly checked for presence or absence ofrelevant fragments, each contributing a certain value to the finalprediction.

The QSAR model was estimated by external validation usingvalidation sets V1 and V2 (for the global model). For comparisonpurposes, the Leadscope model performance was also checked spe-cifically on V1AZA and V2AZA (this exercise should not, strictlyspeaking, be treated as validation, because the structures inV1AZA and V2AZA are only AZA and therefore not representing thestructural diversity of the training set of the QSAR model). In addi-tion, the model was estimated by our own implementation of 10times leave-50%-out cross-validation, strictly not allowing forany use of information from the overall data set during eitherthe variable selection or the construction of any of the 10 sub-models necessary in such a validation. In all predictions, both forthe purposes of cross-validation and for the external validations,only predictions inside the applicability domain of the globalmodel were taken into consideration. The results are shown inTables 4–6.

3. Results

3.1. Decision tree definition

The following rule was derived for hERG IC50 <10 lM of acidsand zwitterionic ampholytes:

MaxDiamEff > 10:36 ðA�Þ AND Max DEMaxN

> 0:278 ða:u:=eVÞ AND pKa ðacidicÞ > 5

em. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Page 5: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

Table 5External validation of the global model with validation sets V1 and V2

V1 V2

Total tested/active/inactive 344:244:100 225:112:113Total in domain/positive/negative 261:169:92 153:94:59Correct predictions/true positives/true

negatives219:156:63 118:70:48

Sensitivity (%) 84 86Specificity (%) 83 67Concordance (%) 84 77Coverage (%) 76 68

The non-blockers in V2 are defined as IC50 P10 lM.

Table 6Cooper statistics of the performance of the global QSAR model on the AZA validationsubsets

V1AZA V2AZA

Total tested/active/inactive 35:8:27 48:11:37Total in domain/positive/negative 31:7:24 32:9:23Correct predictions/true positives/true negatives 27:5:22 25:4:21Sensitivity (%) 71 67Specificity (%) 92 81Concordance (%) 87 78Coverage (%) 89 65

Table 3Validation statistics of the decision tree for V1AZA and V2AZA

V1AZA V2AZA

Total tested/active/inactive 35:8:27 48:11:37Total predicted/positive/negative 34:9:25 44:8:36Correct predictions/true positives/true negatives 31:7:24 40:6:34Sensitivity (%) 88 75Specificity (%) 92 94Concordance (%) 91 91

The non-blockers in V2AZA are defined as IC50 P10 lM.

Table 2Performance of the decision tree for training set TAZA

Total tested/active/inactive 153:35:118Total predicted conclusive/positive/negative 148:27:121Correct predictions/true positives/true negatives 134:24:110Sensitivity (%) 69Specificity (%) 97Concordance (%) 91

Table 4Cross-validation Cooper statistics of the global QSAR model

Sensitivity (%) 90Specificity (%) 83Concordance (%) 88

N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx 5

based on three descriptor thresholds of the descriptors pKa

(acidic), MaxDiamEff (the maximum of the effective cross-sectional diameter), and Max DE

MaxN (the maximum donor superde-localizability DE calculated at nitrogen atoms, see also Appendix Afor information on the two latter descriptors).

The maximum values were taken on all generated conformersof a given structure, therefore the rule can be interpreted as:

A structure is positive if pKa(acidic) >5 and there exists a con-former such that DiamEff >10.36 (Å) AND DE

MaxN >0.278 (a.u./eV).�

The rule above will be used as the only positive rule in a deci-sion tree. In order to define negative rules, we notice that failureto match any of the three conditions should lead to a negative pre-diction. In particular, either of the two conditions MaxDiamEffbelow 10.36 (Å) or pKa(acidic) below 5 is enough to warrant a neg-ative prediction. The delocalizability condition however requires atleast one nitrogen atom in the molecule in order for the descriptorto be defined, in which case Max DE

MaxN below 0.278 (a.u./eV) willbe enough to warrant a negative prediction. However, structures

� There is another interpretation of the rule condition: A structure is positive ifpKa(acidic) >5 and there exists at least one conformer with DiamEff >10.36 (Å) andthere exists at least one conformer (possibly a different one) with Max DE

MaxN >0.278(a.u./eV). The performance of the rule was even higher under this relaxed interpre-tation but we chose the stricter interpretation above due to its more transparentmechanistic meaning.

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Che

lacking a nitrogen atom but having MaxDiamEff >10.36 (Å) and pKa

(acidic) >5 cannot be predicted either positive (because the superde-localizability condition is not fulfilled) or negative (no negative con-dition is fulfilled), therefore we define the prediction in this case asinconclusive.

The decision tree is presented as a flowchart in Figure 1.

3.2. Internal performance

The internal performance of the decision tree (how well thedecision tree predicts its own training set) is presented as Cooperstatistics in Table 2.

3.3. Validation results

The Cooper statistics of the performance of the rule-basedmodel for the two validation sets V1AZA and V2AZA are presentedin Table 3.

3.4. Validation of the QSAR model and comparison to thedecision tree

Table 4 presents the external validation performance of the glo-bal model. The model has overall good performance for both vali-dation sets. This model covers structures of diverse acidity/basicity(acidic, zwitterionic, basic, neutral). Although the performance isunderstandably somewhat lower than for the specific decision treefor acids and zwitterionic ampholytes, it presents a good all-pur-pose model for discriminating hERG blocker from non-blockers.

The cross-validation performance of the global model is pre-sented in Table 4.

The external validation performance of the model is presentedin Table 5.

The statistics for the AZA subsets of V1 and V2 are presented inTable 6 (the training set contains chemicals with diverse ionizationproperties while V1AZA and V2AZA contain AZA only).

4. Discussion

This work proposes a characterization of the hERG blocking byacids and zwitterionic ampholytes. For this class of compounds,we propose a combination of three descriptor thresholds to dis-criminate between hERG blockers (IC50 <10 lM) and non-blockers(IC50 P40 lM). A decision tree consisting of this combination ofthresholds and an additional rule for inconclusives predicted thehERG blocking for all 153 training acids and zwitterionic ampho-lytes in the training set with internal concordance of 91% and spec-ificity of 97%. The predictivity was confirmed using two externalvalidation sets of experimental data on hERG of 35 and 48 chemi-cals, respectively, both showing concordance of 91% and specificityover 92%. The sensitivity (69% in the training set and 88% and 75%in the two respective validation sets) shows that these threedescriptor thresholds seem to be able to correctly predict the

m. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Page 6: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

Figure 1. The decision tree for prediction of hERG blocking for AZA.

6 N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx

majority of the experimental observations on hERG positive acidsand zwitterionic ampholytes. However, it also indicates that thedecision tree is not covering around 12–25% of the positives, andthat these may work by a mechanism not identified by the rules.Due to insufficient data to train the model on, certain classes ofstructures, such as non-nitrogen acids and zwitterionic ampho-lytes with high pKa and high effective cross-sectional diameter,are out of the scope of this work.

We consider the small number of descriptors and the transpar-ent modeling methodology (decision tree) advantages of ourapproach compared to existing hERG models. To our knowledge,no other predictive model of hERG uses so few descriptors whilemodeling a large number of chemicals with high predictivity(91%) (for the number of descriptors, the only exceptions are (a)the predictive model in Ref. 37 which uses two descriptors, logDand maximum interatomic distance, and is based on a limitedtraining set of 19 chemicals and validated on 81 chemicals, and(b) a decision tree using three descriptors, namely logP, molarrefractivity and pKa of the most basic nitrogen, presented in a con-ference poster38 not available to the authors so we report this pub-lication after Refs. 7,8). Other things being equal, a lower numberof descriptors compared to the number of observations in thetraining set of a model is generally desirable, both because it facil-itates the interpretation of the model and because it indicates thatthe model is not overfitted (too specific) and therefore is robust. Arule of thumb in cheminformatics is that this ratio should notexceed 1/6 to 1/3;39 in our decision tree, this ratio is 1/51, one ofthe lowest among the available hERG models.

These advantages of our decision tree come at the expense oflimiting the applicability domain to chemicals with certain ioniza-tion properties. Different ionization classes were previouslyreported12 to have significantly different hERG profiles. Based onthis, we suggest that developing hERG models of chemicals withthe same or similar acidity/basicity may improve model predictiv-ity or transparency and contribute to mechanistic understanding ofhERG blocking. In particular, we propose specific moleculardescriptors to elucidate hERG blockers among acids and zwitter-ionic ampholytes.

The decision tree is based on three descriptors, one related tosize/shape (effective cross-sectional conformer diameter), onelocal electronic descriptor (maximum electrophilic superdelocaliz-ability calculated at nitrogen atoms), and a physicochemicaldescriptor (acidic pKa). The first two descriptors have been occa-sionally used in the available literature (although not in combina-tion) to model other endpoints than hERG, for example, a model ofandrogen receptor binding with donor superdelocalizability.40

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Ch

While pKa has been previously used in QSAR modeling, a recentreview41 reports that hERG studies specifically using pKa valuesto assess the risk of hERG channel inhibition have yet to emerge.In the same review, the authors report that no specific hERG rela-tionship to pKa has been shown but Wager42 analysed in-housedata to conclude that the risk of hERG channel blockade wasgreater when the molecule had a basic centre with a pKa above8.4. In fact Ref. 38 reports a model with pKa and different studiesuse logD and therefore use lipophilicity corrected for acidity.

A reduced version of the rule without pKa was also tested dur-ing the development of the rule. It resulted in training set perfor-mance as follows: sensitivity of 77%, specificity of 92% andconcordance of 88%. The reduced rule (without pKa) predicted27 true and 9 false positives. We were aware that the model willmiss some positives, such as non-nitrogen (because one of thedescriptors requires nitrogen atoms for a positive prediction)and possibly other hERG positives. For this reason we aimed atoptimizing specificity, that is, minimizing false positive predic-tions. It was found that 6 out of the 9 chemicals predicted falsepositive by the reduced rule contained a carboxylic moiety whileonly 2 out of the 27 true positives had that property. As this workis explicitly using pKa to define the training set of acids and zwit-terionic ampholytes and the role of acidity in attenuating hERG iswell known, this was the motivation to extend the rule with pKa,thereby removing most of the false positives at the expense oflosing a small number of true positives. When we performedthe validation, the same pattern was confirmed: donor superde-localizability and effective diameter were able to pick the largepart of hERG actives, and the fact that pKa was included pre-vented a fair amount of false positives (Fig. 2 (right): the upperright corner shows the positive predictions. If pKa had not beenused (left) but the rule had been based only on MaxDiamEffand Max DE

MaxN, we could expect more false positives).Two of the three descriptors (Max DE

MaxN and pKa) are related tothe ligand’s reactivity and therefore potential for binding to theamino acid residues in the cavity of the hERG channel. The thirddescriptor (effective cross-sectional diameter) correlates with theligand’s potential for blocking the current through the hERG channel(due to the bulk which correlates with the diameter). On the otherhand, high values of MaxDiamEff may indicate too bulky moleculeswhose potential to enter the cavity and block the hERG channel maybe low. The distance between two opposite hERG channel monomersis estimated to be 22 (Å)43 which may be the upper threshold for thediameter of the molecules able to enter the hERG cavity. Weobserved also that the topological descriptor InfoWiener44 seemsappropriate for filtering bulky, round-shaped molecules with high

em. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Page 7: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

Figure 2. Left: MaxDiamEff versus Max DEMaxN for the nitrogen-containing chemicals of V1AZA and V2AZA. Black: hERG IC50 P40 lM. Red: hERG IC50 <10 lM. Right: the same for

the nitrogen-containing chemicals of V1AZA and V2AZA with pKa >5. Rule conditions for a positive prediction: MaxDiamEff >10.36 (Å), Max DEMaxN >0.278 and pKa >5.

N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx 7

DiamEff. InfoWiener was calculated using OASIS Database Manager1.7.3 for all structures by the formula

InfoW ¼ �X

gidi;j

Wlog2

di;j

W

� �

where gi = (number of dij in distance matrix)/2 and dij are the topo-logical distances between nodes i and j; W is the Wiener index ofthe structure (the sum of the lengths of the shortest paths betweenall pairs of vertices in the chemical graph representing the non-hydrogen atoms in the molecule.45 The Wiener index has been usedas a descriptor in a hERG model and its role as an indicator ofbranching and molecular shape is discussed46). The entire data setof 1718 structures contained 16 structures with a value of InfoWie-ner over 12.1, 15 of which were hERG negative. These 16 moleculeswere almost all ‘round-shaped’ and rigid, thus supporting thehypothesis that it is their size and shape that prevents them fromentering the hERG cavity. While the conditions using InfoWienerdid not add to the predictive performance of the decision tree(the structures predicted negative by the InfoWiener criteria werealso predicted negative by the decision tree using DiamEff, DE

MaxN,or pKa), this may show that values of InfoWiener exceeding 12.1can be used to indicate this type of hERG negatives.

It is interesting to note that lipophilicity was not selectedamong the most important parameters for hERG blocking affinityof AZA. In our data set, all observations with logP <1 had an IC50

P40 lM; high values of logP, however, were not sufficient forhERG activity to be present. This may show that logP, while cer-tainly highly correlating with hERG as reported in many studies,is a cruder estimate of the necessary properties for hERG blockingfor the case of acids and zwitterionic ampholytes, compared to thedescriptors in our decision tree. The performance of the rule on thetraining set was also studied with the condition logD (atpH = 7.4) > �0.51 instead of pKa >5, where the descriptor was cal-culated using ACD/Tox Suite 2.95, pKa and ion channels add-in2.95 and the threshold for it was derived using the See5 systemfor the training set TAZA. Sensitivity was marginally higher whilespecificity and concordance were marginally lower (129 correctpredictions, of which 26 positive and 103 negative; the data forthe original rule are presented in Table 2) than for the version ofthe rule using pKa. Similarly to the finding for logP, all chemicalswith logD 6 �0.51 were tested inactive but high values of logDdid not always mean hERG activity.

MaxDiamEff was found to be the most discriminating descriptorwhen selecting descriptors for the total training set TGLOBAL withSee5. A model consisting of MaxDiamEff alone was predictinghERG for all ionization classes with a concordance of 66%. This sup-port the view that MaxDiamEff can be regarded as an indicator of

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Che

the necessary size for a molecule to block the hERG current (arelated descriptor, maximum interatomic distance, was found37

to correlate positively with hERG potency), which should be rele-vant for all hERG blockers irrespective of their ionization class,and that this descriptor is not related to reactivity. In contrast,Max DE

MaxN and pKa were specifically important for hERG modelingof AZA.

A recent review9 has described that hERG ion channel pharmaco-phore models from a number of different studies agree that chargednitrogen (hydrogen bond acceptor) and aromatic rings (hydrophobicfeature) were important features to consider in hERG binding.Chemicals containing basic nitrogen atoms have been found inmany cases to have a higher affinity than corresponding chemicalswithout such a group. The reason possibly is that nitrogen isinvolved in inner cavity binding site interactions. In the AZA trainingset applied for the decision tree as well as for the two AZA validationsets there were a considerable number of cases where compoundscontained basic nitrogen and were tested negative for hERG activity.With the proposed decision tree we define—in addition to acidity(pKa) and considerations of size (effective diameter)—specific condi-tions for hERG affinity requiring a threshold for reactivity (donorsuperdelocalizability) related to nitrogen atoms. We interpret therole of donor superdelocalizability as the potential of the ligand tocreate interactions between the nitrogen and an appropriate residueof the hERG channel, such as, for example, Tyr652 or Phe656, consid-ered to form interactions to basic nitrogen.47

The training set was also used to develop a global QSAR modelof hERG blocking for the general training set using Leadscope Pre-dictive Data Miner. The global model was estimated by externalvalidation and by robust cross-validation and showed very goodperformance. The proposed decision tree demonstrated superiorpredictivity for the specific AZA part of the training set over theglobal QSAR model in terms of sensitivity, specificity, and concor-dance in two external validations.

5. Conclusion

Different ionization classes have been shown to exhibit sub-stantial differences in hERG blocking properties.12 Based on thereported observations that hERG blocking properties of a chemicaldepend strongly on its ionization state, we assumed that specificmodels of hERG blocking for chemicals with the same acidity/basi-city properties may contribute towards improved predictivity and/or better transparency and therefore mechanistic understanding ofhERG blocking.

In particular, it has been reported that hERG affinity of acids andzwitterions is generally lower than that of either bases or neutrals.12

m. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Page 8: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

8 N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx

Still, a considerable number of acidic compounds have been shownto block hERG, and the physical–chemical attributes which discrim-inate acidic blockers from acidic non-blockers have not beendescribed in detail.

This work proposes a characterization of hERG blocking by acidsand zwitterionic ampholytes.

We identified only three descriptors to be sufficient to modelthe majority of the collected data for these structural classes, andproposed a rule based on thresholds for the three descriptors forprediction of hERG positives (IC50 <10 lM) for AZA. Used in a deci-sion tree, these thresholds predicted correctly the hERG blocking ofthese ionization classes for the total of 153 diverse training chem-icals, with concordance of 91%. The high predictivity was con-firmed using two independent validation sets of experimentaldata on hERG (concordance of 91% for both validation sets and sen-sitivity and specificity above 75%). Due to insufficient data to trainthe model on, certain classes of structures, such as non-nitrogenacids and zwitterionic ampholytes with high diameter and pKa,are not covered by the model.

The three descriptors are: one related to size/shape (maximumeffective cross-sectional conformer diameter), an electronicdescriptor related to reactivity (maximum electrophilic delocaliz-ability calculated at nitrogen atoms), and a physicochemicaldescriptor (acidic pKa).

A global QSAR model of hERG blocking was also constructed,based on a large and structurally diverse training set of 1374chemicals, using Leadscope Predictive Data Miner. The QSAR modelshowed very good performance with two external validations of344 and 225 chemicals, yet the proposed decision tree demon-strated superior predictivity for the specific groups of acids andzwitterionic ampholytes in terms of sensitivity, specificity, andconcordance in the external validations.

Acknowledgment

The authors acknowledge financing by the Danish EnvironmentProtection Agency.

Appendix A. Additional information

Conformer effective cross-sectional diameter: This is a conforma-tional descriptor defined as the diameter of the least-diameter cyl-inder containing the conformer. More formally, let I be a set ofpoints in R3 so that each atom of the conformer is represented byexactly one element of I which has the same three-dimensionalcoordinates as the atom. Then DiamEff is defined by:

DiamEff ¼ minl is a line in R3

maxfdðl; cÞjc 2 Ig

where R3 is the three-dimensional Euclidean space of real numbersand d(x,y) denotes the Euclidean distance between a line x and apoint y in R3 (the definition of a smallest encompassing cylindercan be found, e.g., in Ref. 48). We use the implementation of thisdescriptor in OASIS Database Manager 1.7.3 (http://oasis-lmc.org26).

Donor (electrophilic) superdelocalizability DE: The atomic descrip-tor donor (electrophilic) superdelocalizability is a variant of reac-tivity indices in the Hückel molecular orbital scheme and wasoriginally defined by Ref. 49 and implemented into MOPAC(http://openmopac.net/manual/super.html,30,31) and used in theOASIS Database Manager system (http://oasis-lmc.org). Thedescriptor is related to the capability of atoms to create bonds bydonation of electrons. The donor (electrophilic) superdelocalizabil-ities are calculated according to the method described in Refs.50,51.

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Ch

DE(r), the donor (electrophilic) delocalizabilities of a reactant’scentre r according to Ref. 49 can be defined within all-valence elec-tron schemes as described in Ref. 50 and http://openmopac.net/manual/super.html, according to the following formula:

DEðrÞ ¼ 2Xocc

i

XlðrÞ

c2li

ei � a

In this formula50 the outer sums go over all occupied (‘occ’)molecular orbitals i of the molecule in the self-consistent field(SCF) ground state, and the inner sums put together the contribu-tions of all atomic orbitals p, belonging to the center r of interest. Inparticular, cli is the linear combination of atomic orbitals—molec-ular orbitals (LCAO-MO) coefficient of atomic orbital l, at center rin the molecular orbital i, ei is the energy of the i-th molecularorbital and a is defined as the average of the HOMO and LUMOenergies, that is,

a ¼ 12ðeHOMO þ eLUMOÞ:

Leadscope Predictive Data Miner (http://www.leadscope.com,34,35) is a software system for development and use of QSARmodels containing also other related (e.g., data mining and graph-ical) tools. The descriptors include approximately 27,000 structuralfeatures stored in a template library and the structural featureschosen for analysis are motivated by those typically found in smallmolecules: aromatics, heterocycles, spacer groups, simple substit-uents. Additionally, the system can generate (mine) training set-dependent structural features (scaffolds) that are over-representedeither in the negative or in the positive training subset, and it alsoestimates molecular descriptors for each structure: the octanol/water partition coefficient (A logP), hydrogen bond acceptors,hydrogen bond donors, Lipinski score, atom count, parent com-pound molecular weight, polar surface area and rotatable bonds.The model building process in Leadscope includes an automatedprocedure of structural feature and numeric descriptor selectionusing t- and Yates’ X2 statistic metrics (manual options for descrip-tor selection are also available). The Leadscope algorithm forbuilding QSAR models is based on structural features and numericdescriptors using partial logistic regression for a binary responsevariable.

See5: The See5 algorithm for construction of decision trees(http://www.rulequest.com/see5-info.html) is an improvement ofthe ID3 and C4.5 algorithms developed by the same author.52

Algorithms for construction of binary decision tree classifierssplit the training set in two so that the content of actives and inac-tives is different enough between the two partitions, according topredefined fitness criteria. If the split is not sufficient to achievethe desired classification accuracy, one or both parts are in turnpartitioned, until a stop condition is met. Non-leaf nodes containtests that divide up the training cases. The ideal binary test isone that splits completely the two classes. This test would leadto perfect classification accuracy. However, such tests are hard tofind, and for many domains they may not exist at all. See5 andits predecessors use formulas based on information theory to eval-uate the ‘goodness’ of a test; in particular, they choose the test thatextracts the maximum amount of information from a set of casesusing entropy criteria.

Decision trees have been used to construct global models ofhERG (e.g., Ref. 53), including C4.5.54

Zwitterionic ampholyte: An amphoteric compound (zwitterioniccompound) with both acidic and basic ionogenic groups andwherein the pKa of the acidic group (pKa(acidic)) is less than pKa

of the basic group (pKa(basic)), thus pKa(acidic) <pKa(basic).

em. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Page 9: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx 9

Appendix B. Initial descriptors

The list of descriptors names from OASIS Database Manager1.7.3 used for the derivation of the model (http://www.oasis-lmc.org,30,31 http://openmopac.net/manual/super.html).

Descriptor type List of descriptors

Atomic ACCEPT_DLC, BOND_ORDER, DONOR_DLC, POLAR, POP_HOMO, POP_LUMO, Q, VWACWN, VWACWP, VWPNSA,VWPPSA

Conformational A_alpha_C, A_max, A_max_Benzene, Atom_dist_ratio, Bond_Order_Hlg, CALC._HEAT_FORM., D_max, DiamEff,DiamMax, DiamMin, DIPOLE_MOMENT, E_GAP, E_HOMO, ELECTRONEGATIVITY, Electrophilicity, E_LUMO,GEOM._INFO_WIENER, GEOM._WIENER, PLANARITY, PLANARITY_conjugate, Q_Aldehyde_O, RNCG, RPCG,SASurf_FNSA1, SASurf_FNSA2, SASurf_FNSA3, SASurf_FPSA1, SASurf_FPSA2, SASurf_FPSA3, SASurf_RNCS,SASurf_RPCS, SASurf_WNSA1, SASurf_WNSA2, SASurf_WNSA3, SASurf_WPSA1, SASurf_WPSA2, SASurf_WPSA3,SVWNPSA, SVWPPSA, VAN_D._WAALS_SUR., VAN_D._WAALS_VOL., VdWSurf_DPSA1, VdWSurf_DPSA2,VdWSurf_DPSA3, VdWSurf_FNSA1, VdWSurf_FNSA2, VdWSurf_FNSA3, VdWSurf_FPSA1, VdWSurf_FPSA2,VdWSurf_FPSA3, VdWSurf_PNSA1, VdWSurf_PNSA2, VdWSurf_PNSA3, VdWSurf_PPSA1, VdWSurf_PPSA2,VdWSurf_PPSA3, VdWSurf_RNCS, VdWSurf_RPCS, VdWSurf_WNSA1, VdWSurf_WNSA2, VdWSurf_WNSA3,VdWSurf_WPSA1, VdWSurf_WPSA2, VdWSurf_WPSA3, VOLUME_POLARIZAB

2D Log (Kow), Info_Wiener, pKa(acidic), pKa(basic), N_AromaticBonds, N_CycleBonds, N_HEAVY_ATOMS

Patent application

Parts of this research (the decision tree for prediction of hERGfor AZA) are included in a patent application.55 However, theauthors would like to encourage the free use of the results in thepresent paper by others.

Supplementary data

Supplementary data (the training sets TGLOBAL and T1AZA andvalidation sets V1 and V1AZA) associated with this article can befound, in the online version, at http://dx.doi.org/10.1016/j.bmc.2014.09.007.

References and notes

1. Singh, J. N.; Sharma, S. S. hERG Potassium Channels in Drug Discovery andDevelopment. In Ion Channels and Their Inhibitors; Gupta, S., Ed.; Springer, 2011.

2. Jentsch Nat. Rev. Neurosci. 2000, 1, 21.3. Sanguinetti, M. C.; Jiang, C.; Curran, M. E.; Keating, M. T. Cell 1995, 81, 299.4. Mitcheson, J. S.; Chen, J.; Lin, M.; Culberson, C.; Sanguinetti, M. C. Proc. Natl.

Acad. Sci. U.S.A. 2000, 97, 12329.5. Sanguinetti, M. C.; Tristani-Firouzi, M. Nature 2006, 440, 463.6. Crumb, W.; Cavero, I. Pharm. Sci. Technol. Today 1999, 2, 270.7. Aronov, A. M. Drug Discovery Today 2005, 10, 149.8. Schiesaro, A.; Ecker, G. F. Prediction of hERG Channel Inhibition using in silico

Techniques. In Ion Channels and their Inhibitors; Gupta, S., Ed.; Springer, 2011.http://dx.doi.org/10.1007/978-3-642-19922-6_7.

9. Taboureau, O.; Jørgensen, F. S. Comb. Chem. High Throughput Screen. 2011, 14,375.

10. Wanga, S.; Lia, Y.; Xua, L.; Lib, D.; Hou, T. Curr. Top. Med. Chem. 2013, 13.11. Su, B. H.; Shen, M. Y.; Esposito, E. X.; Hopfinger, A. J.; Tseng, Y. J. J. Chem. Inf.

Model 2010, 50, 1304.12. Waring, M. J.; Johnstone, C. Bioorg. Med. Chem. Lett. 2007, 17, 1759.13. Enzyme Inhibition in Drug Discovery and Development: The Good and the Bad; Lu,

C., Li, A. P., Eds.; Wiley, 2010.14. Li, Q.; Jørgensen, F. S.; Oprea, T.; Brunak, S.; Taboureau, O. Mol. Pharm. 2008, 5,

117.15. Obiol-Pardo, C.; Gomis-Tena, J.; Sanz, F.; Saiz, J.; Pastor, M. J. Chem. Inf. Model.

2011, 51, 483.16. Polak, S.; Wisniowska, B.; Brandys, J. J. Appl. Toxicol. 2009, 29, 183. http://

dx.doi.org/10.1002/jat.1395.17. Doddareddy, M. R.; Klaasse, E. C.; Shagufta, X.; IJzerman, A. P.; Bender, A.

ChemMedChem 2010, 5, 716.

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Che

18. Liu, T.; Lin, Y.; Wen, X.; Jorrisen, R. N.; Gilson, M. K. Nucl. Acids Res. 2007, 35,D198.

19. Murphy, S. T. et al Bioorg. Med. Chem. Lett. 2007, 17, 2150.20. Diaz, G. J. et al J. Pharmacol. Toxicol. Methods 2004, 50, 187.21. Keserü, G. Bioorg. Med. Chem. Lett. 2003, 13, 2773.22. Brugel, T. A.; Smith, R. W.; Balestra, M.; Becker, C.; Daniels, T.; Hoerter, T. N.;

Koether, G. M.; Throner, S. R.; Panko, L. M.; Folmer, J. J.; Cacciola, J.; Hunter, A.

M.; Liu, R.; Edwards, P. D.; Brown, D. G.; Gordon, J.; Ledonne, N. C.; Pietras, M.;Schroeder, P.; Sygowski, L. A.; Hirata, L. T.; Zacco, A.; Peters, M. F. Bioorg. Med.Chem. Lett. 2010, 20, 5847.

23. Haga, Y.; Mizutani, S.; Naya, A.; Kishino, H.; Iwaasa, H.; Ito, M.; Ito, J.; Moriya,M.; Sato, N.; Takenaga, N.; Ishihara, A.; Tokita, S.; Kanatani, A.; Ohtake, N.Bioorg. Med. Chem. 2011, 19, 883.

24. Marquis, R. W.; Lago, A. M.; Callahan, J. F.; Rahman, A.; Dong, X.; Stroup, G. B.;Hoffman, S.; Gowen, M.; DelMar, E. G.; van Wagenen, B. C.; Logan, S.; Shimizu,S.; Fox, J.; Nemeth, E. F.; Roethke, T.; Smith, B. R.; Ward, K. W.; Bhatnagar, P. J.Med. Chem. 2009, 52, 6599.

25. Shaw, S. J.; Chen, Y.; Zheng, H.; Fu, H.; Burlingame, M. A.; Marquez, S.; Li, Y.;Claypool, M.; Carreras, C. W.; Crumb, W.; Hardy, D. J.; Myles, D. C.; Liu, Y. J. Med.Chem. 2009, 52, 6851.

26. Nikolov, N. G.; Grancharov, V.; Stoyanova, G.; Pavlov, T.; Mekenyan, O. J. Chem.Inf. Model. 2006, 46, 2537.

27. Juska, L.; Didziapetris, R.; Japertas, P. Abstr./Toxicol. Lett. 2008, 180S, S32.28. Mekenyan, O. G.; Dimitrov, D.; Nikolova, N.; Karabunarliev, S. J. Chem. Inf.

Comput. Sci. 1999, 39, 997.29. Mekenyan, O. G.; Pavlov, T.; Grancharov, V.; Todorov, M.; Schmieder, P.; Veith,

G. J. Chem. Inf. Model. 2005, 45, 283.30. Stewart, J. J. P. J. Comput.-Aided Mol. Des. 1990, 4, 1.31. Stewart, J.J.P. MOPAC 93; Fujitsu Limited: Chiba 261, Japan; Stewart

Computational Chemistry: Colorado Springs, CO, 1993.32. Clark, T. In Recent experimental and computational advances in molecular

spectroscopy. NATO-ASI series C; Fausto, R., Ed.; Kluwer: Dordrecht, 1993; Vol.406, p 369.

33. Dewar, J. M.; Zoebish, E. G.; Healy, Z. E.; Steward, J. J. J. Am. Chem. Soc. 1985, 107,3902.

34. Cross, K. P.; Myatt, G.; Yang, C.; Fligner, M. A.; Verducci, J. S.; Blower, P. E. J. Med.Chem. 2003, 46, 4770.

35. Valerio, L. G.; Yang, C.; Arvidson, K. B.; Kruhlak, N. L. Expert Opin. Drug Metab.Toxicol. 2010, 6, 505.

36. Jaccard, P. Bulletin del la Société Vaudoise des Sciences Naturelles 1901, 37, 241(in French).

37. Aptula, A. O.; Cronin, M. T. SAR QSAR Environ. Res. 2004, 15, 399.38. Buyck, C.; Tollenaere, J.; Engels, M. et al., The 14th European Symposium on

QSARs. Bournemouth, UK, 2002.39. Esposito, E. X.; Hopfinger, A. J.; Madura, J. D. Methods for Applying the

Quantitative Structure–Activity Relationship Paradigm. In Chemoinformatics:Concepts, Methods, and Tools for Drug Discovery, Methods in Molecular Biology;Bajorath, J., Ed.; Humana Press, 2004.

40. Todorov, M.; Mombelli, E.; Aït-Aïssa, S.; Mekenyan, O. SAR QSAR Environ. Res.2011, 22, 265.

41. Manallack, D. T.; Prankerd, R. J.; Yuriev, E.; Oprea, T. I.; Chalmers, D. K. Chem.Soc. Rev. 2013, 42, 485.

42. Wager, T. T.; Hou, X.; Verhoest, P. R.; Villalobos, A. ACS Chem. Neurosci. 2010, 1,435.

43. Zachariae, U.; Giordanetto, F.; Leach, A. G. J. Med. Chem. 2009, 52, 4266.44. Graph Theoretical Approach to Chemical Reactivity; Bonchev, D., Mekenyan, O.,

Eds.; Kluwer Academic Publishers: Dordrecht, 1994.

m. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007

Page 10: hERG blocking potential of acids and zwitterions characterized by three thresholds for acidity, size and reactivity

10 N.G. Nikolov et al. / Bioorg. Med. Chem. xxx (2014) xxx–xxx

45. Wiener, H. J. Am. Chem. Soc. 1948, 69, 17.46. Ekins, S.; Balakin, K. V.; Savchuk, N.; Ivanenkov, Y. J. Med. Chem. 2006, 49, 5059.47. Sanguinetti, M. C.; Mitcheson, J. S. Trends Pharmacol. Sci. 2005, 26, 119.48. Schömer, E. Algorithmica 2000, 27, 170.49. Fukui, K.; Kato, H.; Yonezawa, T. Bull. Chem. Soc. Jpn. 1961, 27, 423.50. Schüürmann, G. Env. Tox. Chem. 1990, 9, 417.51. Schüürmann, G. Quant. Struct.-Act. Relat. 1990, 9, 326.52. Quinlan, J. R. C4.5: Programs for Machine Learning; Morgan Kaufmann: San

Mateo, CA, 1993.53. Gepp, M. M.; Hutter, M. C. Bioorg. Med. Chem. 2006, 14, 5325.54. Wang, M.; Yang, X. G.; Xue, Y. QSAR Comb. Sci. 2008, 27, 1028. http://dx.doi.org/

10.1002/qsar.200810015.55. Nikolov, N.G.; Wedebye, E.B. A method for prediction of hERG potassium

channel inhibition in acidic and zwitterionic compounds, European PatentApplication 13194399.5–1408, 2013.

Please cite this article in press as: Nikolov, N. G.; et al. Bioorg. Med. Ch

Web references

1. http://oasis-lmc.org.2. http://www.bindingdb.org.3. http://www.acdlabs.com/products/admet/tox/.4. http://www.rulequest.com.5. http://www.rulequest.com/see5-info.html.6. http://www.leadscope.com.7. http://apps.echa.europa.eu/preregistered/prsDownload.aspx.8. http://opentox.org/data/documents/development/opentoxreports/

opentoxreportd34.9. http://openmopac.net/manual/super.html.

em. (2014), http://dx.doi.org/10.1016/j.bmc.2014.09.007