16 th online world conference on soft computing in industrial applications special session soft...

18
16 th Online World Conference on Soft Computing in Industrial Applications Special Session Soft Computing Methods in Pharmaceutical and Medical Sciences In silico binary classification model for hERG liability screening Barbara Wiśniowska, Aleksander Mendyk Unit of Pharmacoepidemiology and Pharmacoeconomics Jagiellonian University Medical College

Post on 19-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

16th Online World Conference on Soft Computing in Industrial Applications

Special SessionSoft Computing Methods in Pharmaceutical and Medical

Sciences

In silico binary classification model for hERG liability screening

Barbara Wiśniowska, Aleksander Mendyk

Unit of Pharmacoepidemiology and Pharmacoeconomics

Jagiellonian University Medical College

drug withdrawals

16th Online World Conference on Soft Computing in Industrial Applications

kardiotoksyczność hepatotoksyczność hematotoksyczność inne0

5

10

15

20

21

15

4

7

Dru

g w

ith

dra

wals

cardiotoxicity

hepatotoxicity

hematotoxicity

other

Over the last two decades, a number of blockbuster drugs have been withdrawn and others have received new “black box” warnings due to adverse cardiac effects. In addition, lead compounds or drug candidates are frequently terminated at late stages of drug development due to cardiac safety concerns. Both of these factors can significantly impact the overall cost of drug discovery; consequently, there is an increasing interest in early assessing the cardiac liability of drugs which are currently under development. What is more cardiotoxicity assessment is a compulsory element of the drug development process required by drug agencies.

LQTS

TdP

mem

bra

ne p

ote

nti

al

time

INa

Ito

ICa

ICa

Iks(KCNQ1 channel) IKr

(hERG channel)

mechanism• inhibition of the rapid delayed

rectifier potassium current Ikr

(channel encoded by the hERG gene)

• channel encoded by the hERG gene

TdP – Torsades de Pointes

16th Online World Conference on Soft Computing in Industrial Applications

Proarrhythmic risk, defined as the TdP occurrence possibility, can be induced by the perturbation of the ionic balance of the heart. The low incidence of TdP and possibly lethal effects entail the necessity of the surrogate markers usage for the risk estimation. It is well accepted that delayed ventricular repolarization (QT prolongation on the ECG graph) is a surrogate biomarker linked with enhanced proarrhythmic risk. The main cause underlying the QT prolongation is a blockade of the hERG potassium channels which are responsible for the delayed rectifier potassium current (Ikr) in cardiomyocytes. Thus hERG channel blocking potency, namely the drug concentration that produce half-maximal channel inhibition (IC50), is accepted, recommended and widely used as one of the cardiac liability marker used for the TdP risk screening.

16th Online World Conference on Soft Computing in Industrial Applications

•rubidium flux•radioactive ligands binding•fluorescence assessment•electrophysiological methods (HEK, CHO, XO cell lines) -

patch clamp

-100

-60

-20

20

60

Inhib

itio

n

(%)

log concentration (µM)

micropipette

cellmembrane

connection50 MΩ

underpressure

connection10 - 100 GΩ

proarrhythmic liability assessmentin vitro

The hERG blocking potency, expressed by IC50 – half maximal inhibitory concentration of tested compund - is most widely used predictor of cardiac liability defined at the earliest stages of drug development is. There are several in vitro approaches used to quantify drug-hERG interactions including rubidium-flux and radioligand binding assays, in vitro electrophysiology measurements and fluorescence-based assays. Among these methods, electrophysiological assays, especially the manual PatchClamp technique in the whole cell mode, are considered as a ‘gold standard’. Patch Clamp method is based on the application of voltage stimuli to cells and measurement of the direct current from ion channels within the membrane patch. The amplitude of tail current is measured in absence and presence of different concentrations of the tested compound in bath solution. Once the data are collected, the concentration producing half-maximal block of the hERG potassium current (IC50) can be calculated by fitting the data with Hill equation and obtaining a concentration-response curve. This technique is labor-intensive and low-throughput but it provides accurate in-depth information on the effect of drugs on ion channels.

Compound IC50 channel' model' source temp' K+ [mM] bath solution t1 puls [s] holding pot depol puls to mV pomiar

Amiodarone 3.84 4 4 Kamiya 2001a 2 5.4 0.2 -50 30 -50Amiodarone 1.2 4 4 Zankov 2005 2 5.4 2 -50 30 -50Amiodarone 1.74 4 4 Zankov 2005 2 5.4 0.5 -50 30 -50AZD7009 193 1 2 Persson 2005 1 5.4 6 -80 40 40Azimilide 1.4 1 2 Busch1998 1 5.4 6 -80 40 40Azimilide 2.6 3 1 Busch1998 1 2 15 -80 -10 -10Azimilide 3.1 3 1 Persson 2005 1 10 15 -80 -10 -10Bepridil 6.2 4 4 Wang 1999a 2 5.4 1 -40 60 -40Canrenoic_acid 22.3 1 2 Gomez 2005 1 4 2 -80 40 -40

Chromanol_293B 26.9 2 1 Loussouarn 1997

1 2 3 -80 40 40

Chromanol_293B 6.9 1 1 Lerche 2007 1 2 12 -80 40 40

16th Online World Conference on Soft Computing in Industrial Applications

proarrhythmic liability in silico

Compound IC50 [µM]

model source transf temp [°C]

tech K+ [mM]

t1;t2 pulse

[s]

hold [mV]

depol [mV]

measure

[mV]protocol

4-aminopyridine 4400 HEK Ridley 2003 stably 37 whole cell PC 4 1;1.5 -40 30 -40 stepAcehytisine 465.95 HEK Huang2008 stably 23 whole cell PC 4 4;4 -80 20 20 step

Ajmaline1.04

HEKKiesecker

2004 stably 21 whole cell PC 50.4;0.

4 -80 40 -120 step

Ajmaline42.3

XOKiesecker

2004 21voltage-clamp 2-

electrode 50.4;0.

4 -80 40 -60 stepAmbasilide 16.1 CHO Walker 2000 stably 22 whole cell PC 4.8 3.9;5 -80 -30 -60 stepAmbasilide 3.6 CHO Walker 2000 stably 22 whole cell PC 4.8 3.9;5 -80 30 -60 stepAmiodarone 0.048 CHO Guo 2005 stably 35 whole cell PC 4 1;0.5 -80 20 -40 step

Amiodarone37.9

XOKamiya 2001a 23

voltage-clamp 2-electrode 5.4 2;0 -90 40 -70 step

Amiodarone9.8

XO Kiehn 1999 20voltage-clamp 2-

electrode 50.4;0.

4 -80 30 -60 step

Amitriptyline4.78

XO Jo 2000 21.5voltage-clamp 2-

electrode 40.5;0.

5 -70 20 -60 step

Amitriptyline4.66

XO Jo 2000 21.5voltage-clamp 2-

electrode 2 4;4 -70 30 -60 step

Amitriptyline10

CHOTie 2000/Tie

2002 stably 21 whole cell PC 4.8 3.9;5 -80 30 -60 step

The first step of the work included bibliographic analysis of the available literature sources and the collection of quantitative data describing the hERG channel blocking phenomenon. As a result, the data set of experimentally measured IC50 values (the concentration which produces half-maximal block of the hERG channel) with the relevant information regarding factors influencing the laboratory measurement results (electrophysiological settings) was created. Every single paper and data record was carefully checked to meet the inclusion criteria to assure the high quality of the data.

> 300447

175

3transf – transfection typeK+ - potassium ions concentration in bath solutiont1 – duration of depolarization pulse

hold – holding potentialdepol – depolarization levelmeasure – measurement voltage

data base

The final data set relates to over 300 publically available papers.

The final data set consisted of 447 IC50 values (half-maximal inhibitory concentration) for 175 different compounds accompanied by a description of the in vitro experimental settings.

In all experiments three models were used (HEK, CHO, XO).

The database is freely available after registration on the Tox-Comp project website www.tox-portal.net

16th Online World Conference on Soft Computing in Industrial Applications

Temperature extrapolation factor

HEK physiological temperature/HEK room temperature

Inter-system extrapolation factor

Inter-system ratios

HEK

XO

HEK CHO

phys

phys

room

proarrhythmic liability in silicoextrapolation factors

As literature data analysis shows that the hERG interactions experiments carried out in different conditions with use of different in vitro systems for the same substance can result with different IC50 values, we proposed (Wiśniowska, Polak 2009, ToxicolMechMeth) extrapolation factors for inter-system (HEK, CHO, XO) and inter-temperature (room and physiological) IC50 values unification.

The original IC50 values derived from in vitro experiments were scaled with the use of extrapolating factors.

16th Online World Conference on Soft Computing in Industrial Applications

methods & algorithms

proarrhythmic liability in silico

• OUTPUT: binary (unsafe/safe) - classification

• INPUT: phys-chem descriptors + in vitro experimental settings

• ALGORITHMS:

ANN

Decision trees

BayesNet

• VALIDATION: internal (10-fold CV) & external (dataset)

The task, for the model to solve, was to classify compounds according to their cardiac safety. Therefore a flag, “safe” or “unsafe”, (encoded as 0 and 1, respectively), was assigned to each data record. The safety threshold was set at IC50 equal to 1 µM, on the basis of literature analysis. The selected cut-off value for hERG-blokers/non-blokers resulted in 250 records classified as “unsafe” and 197 as “safe”. The subsequent model development process included the selection of the input vector components and a classification algorithm. In addition to the parameters defining laboratory setting, over 100 descriptors of the physico-chemical properties were generated for each compound, using ChemAxon software (Marvin Beans).

• INPUT: in vitro experimental settings + phys-chem descriptors

derived from the available literature

- cell model - XO / CHO / HEK - temperature – room/phys- K+ bath concentration [mM]- t1 pulse [s]- t2 pulse [s]- holding potential [mV]- depolarization level [mV]- measurement potential [mV]

16th Online World Conference on Soft Computing in Industrial Applications

proarrhythmic liability in silicomethods & algorithms

16th Online World Conference on Soft Computing in Industrial Applications

proarrhythmic liability in silico

• INPUT: in vitro experimental settings + phys-chem descriptors

calculated in Marvin Beans package

- sdf files either derived from PubChem or drawn in MarvinSketch

- 41 plugins- 107 numeric inputs natively- 38 parameters after the

sensitivity analysis

methods & algorithms

Sensitivity analysis was performed in order to reduce model complexity and to evaluate the impact of the reduction number of descriptors on the model’s generalization ability. 38 key variables were identified. The chosen descriptors include both those describing experimental settings, as well as the physico-chemical and geometric properties indicated in literature as crucial for drug-hERG channel binding

• INPUT: in vitro experimental settings + phys-chem descriptors

calculated in Marvin Beans package

- Log P- Log D- …- rotatable bound count- chiral center count- stereoisomer count- …- Largest_ring_size- Minimal_projection_radius/

area- Maximal_projection_radius/

area16th Online World Conference on Soft Computing in Industrial Applications

proarrhythmic potency assessmentmethods & algorithms

16th Online World Conference on Soft Computing in Industrial Applications

methods & algorithms

proarrhythmic liability in silico

• OUTPUT: binary (unsafe/safe) - classification

• INPUT: phys-chem descriptors + in vitro experimental settings

• ALGORITHMS:

ANN

Decision trees

BayesNet

• VALIDATION: internal (10-fold CV) & external (dataset)

Artificial neural networks and recursive partitioning methods were used for predictive model development. The specially written ANN simulator, the Nets2010, and WEKA software (Waikato Environment for Knowledge Analysis) were used to perform the model development. In this work classical multi-layer perceptrons, containing 1 to 6 hidden layers, were applied. All ANNs were trained by a back propagation algorithm with momentum and jog-of-weights modifications. The amount of training iteration varied from 10 000 to 5 000 000. The epoch size was set to 1. The training of ANNs involved random data record presentation with or without additional noise in the data (‘rand’ or ‘orig’ in the input data files description, respectively). The data was scaled linearly from 0.2 to 0.8 or from −0.8 to 0.8 (‘scale’ in the input data files description). Decision tress and random forests were developed with the use of the WEKA package. Different tree constructing algorithms with learning parameter modifications were tested. In addition to the single-algorithm-based models, modular systems (so called ‘expert committees’) combining several ANNs or an ANN with WEKA algorithms were tested.

In order to ensure the highest possible reliability of the final model three various validation modes were applied to the model performance assessment: standard 10-fold cross validation procedure (10-fold CV), enhanced 10-fold cross validation and a validation on an external test set of 55 records for both previously present (different in vitro models) and absent in native dataset structures. Modification of the standard 10-CV procedure was proposed. All information describing a particular drug was excluded from the test sets. It assured that the training and test sets were separated as each compound can belong exclusively to either training or test instances.

16th Online World Conference on Soft Computing in Industrial Applications

results – 10 CV

proarrhythmic liability in silico

Architecture Iterations All% SE SP PPV NPV

15_7_5_tanh_rand_scale 1 000 000 78.91 0.84 0.73 0.78 0.80

3_2_tanh_orig_scale 1 000 000 78.30 0.79 0.77 0.80 0.77

5_3_2_tanh_rand_scale 500 000 78.26 0.79 0.77 0.80 0.77

15_7_5_fsr_rand_scale 50 000 78.07 0.80 0.76 0.79 0.77

5_3_2_sigma_rand 200 000 78.07 0.81 0.75 0.79 0.78

…            

RandomForest_I10_K0_S1_orig   75.96 0.74 0.79 0.79 0.73

All – overall classification rate; SE – sensitivity; SP – specificity; PPV – positive predictive value; NPV – negative predictive value

Around 800 classification models were developed and evaluated. The table presents the results for the enhanced 10-CV procedure of 5 best obtained models. The optimal model found during the numerical experimentation was based on the artificial neural network algorithm and composed of three hidden layers and a hyperbolic tangent activation function. The performance estimated in the enhanced 10-fold cross-validation procedure was 79% of the total correct classifications.

16th Online World Conference on Soft Computing in Industrial Applications

results – external validation

proarrhythmic liability in silico

Architecture Iterations All% SE SP PPV NPV

15_7_5_tanh_rand_scale 1 000 00087 (48/55) 0.93 0.81 0.84 0.91

RandomForest_I10_K0_S1_orig   87 (48/55) 0.90 0.85 0.87 0.88

3_2_tanh_orig_scale 1 000 00086 (47/55) 0.90 0.81 0.84 0.88

5_3_2_tanh_rand_scale 500 000 82 (45/55) 0.86 0.77 0.81 0.83

5_3_2_sigma_rand 200 000 53 (29/55) 1.00 0.00 0.53 -

All – overall classification rate; SE – sensitivity; SP – specificity; PPV – positive predictive value; NPV – negative predictive value

The best obtained architectures with identical settings kept, were then trained again with the use of the whole dataset as the learning set (447 records) and their performance was automatically verified, based on the external dataset. The overall classification rate in this procedure for the best algorithm was 87%

16th Online World Conference on Soft Computing in Industrial Applications

results – modular system

proarrhythmic liability in silico

Expert committees were prepared and tested with the purpose of further model quality improvement. The final classification model for hERG channel blockers was based on 2 and 3 neural networks with the highest prediction accuracy for hERG active and non-active compounds, respectively and the best WEKA algorithm.

16th Online World Conference on Soft Computing in Industrial Applications

results – modular system

proarrhythmic liability in silico

All% SE SP PPV NPV

10-CV 82(45/55) 0.82 0.82 0.83 0.81

External validation 87 (48/55) 0.97 0.77 0.82 0.95

All – overall classification rate; SE – sensitivity; SP – specificity; PPV – positive predictive value; NPV – negative predictive value

All% SE SP PPV NPV

KNOWN 100 (25/25) 1.00 1.00 1.00 1.00

NEW 77 (23/30) 0.91 0.68 0.63 0.93

- generalization ability

- generalization ability – in vitro settings & chemical structures

Overall classification accuracy for the expert committee, estimated in 10-CV procedure was 82% and 87% when tested on the external test set. This prove that combining different algorithms can have beneficial effects on prediction accuracy. However, no substantial improvement was observed. Nevertheless, the negative predictive value of the expert committee is considerably better than NPV of the best neural network. The high NPV (0.93) is beneficial, as the intended model function is the elimination of potentially unsafe compounds as early as possible.

External validation set was composed of both new structures (subset “NEW”- 30 records) and molecules known to the system but assessed under different experimental conditions (subset “KNOWN” – 25 records). The analysis of incorrect decisions provided by the model shows that all of them concerned molecules absent in native dataset. This proves very good predictive value of the model as far as the experimental settings are concerned. Extrapolation is less pronounced in case of chemical structure of the compounds. It should be noted that six of the seven misclassifications are false-positives which means that safe compound was classified as hERG-blocker. This is probably better situation than false-negative predictions, because the risk of proceeding of the cardiotoxic compound and failure in later stages due to cardiac liability is low.

Tox-Comp platform

16th Online World Conference on Soft Computing in Industrial Applications

proarrhythmic liability in silico

The developed binary classification model for hERG inhibition is an element of the Tox-Comp.net platform.

The Tox-Comp.net platform is flexible, modular system for the early assessment of the cardiotoxic potency of the chemical entities.

It is freely available after registration from theCompTox project website.

http://www.tox-portal.net

acknowledgements

team

Sebastian Polak PhDMiłosz PolakKamil FijorekAnna GlinkaMałgorzata Kozłowska

project financed by the Polish National Center for Research and Development LIDER project number

LIDER/02/187/L-1/0916th Online World Conference on Soft Computing in Industrial Applications

THANK YOU

Unit of Pharmacoepidemiology and Pharmacoeconomics

Jagiellonian University Medical College