16 th online world conference on soft computing in industrial applications special session soft...
Post on 19-Dec-2015
217 views
TRANSCRIPT
16th Online World Conference on Soft Computing in Industrial Applications
Special SessionSoft Computing Methods in Pharmaceutical and Medical
Sciences
In silico binary classification model for hERG liability screening
Barbara Wiśniowska, Aleksander Mendyk
Unit of Pharmacoepidemiology and Pharmacoeconomics
Jagiellonian University Medical College
drug withdrawals
16th Online World Conference on Soft Computing in Industrial Applications
kardiotoksyczność hepatotoksyczność hematotoksyczność inne0
5
10
15
20
21
15
4
7
Dru
g w
ith
dra
wals
cardiotoxicity
hepatotoxicity
hematotoxicity
other
Over the last two decades, a number of blockbuster drugs have been withdrawn and others have received new “black box” warnings due to adverse cardiac effects. In addition, lead compounds or drug candidates are frequently terminated at late stages of drug development due to cardiac safety concerns. Both of these factors can significantly impact the overall cost of drug discovery; consequently, there is an increasing interest in early assessing the cardiac liability of drugs which are currently under development. What is more cardiotoxicity assessment is a compulsory element of the drug development process required by drug agencies.
LQTS
TdP
mem
bra
ne p
ote
nti
al
time
INa
Ito
ICa
ICa
Iks(KCNQ1 channel) IKr
(hERG channel)
mechanism• inhibition of the rapid delayed
rectifier potassium current Ikr
(channel encoded by the hERG gene)
• channel encoded by the hERG gene
TdP – Torsades de Pointes
16th Online World Conference on Soft Computing in Industrial Applications
Proarrhythmic risk, defined as the TdP occurrence possibility, can be induced by the perturbation of the ionic balance of the heart. The low incidence of TdP and possibly lethal effects entail the necessity of the surrogate markers usage for the risk estimation. It is well accepted that delayed ventricular repolarization (QT prolongation on the ECG graph) is a surrogate biomarker linked with enhanced proarrhythmic risk. The main cause underlying the QT prolongation is a blockade of the hERG potassium channels which are responsible for the delayed rectifier potassium current (Ikr) in cardiomyocytes. Thus hERG channel blocking potency, namely the drug concentration that produce half-maximal channel inhibition (IC50), is accepted, recommended and widely used as one of the cardiac liability marker used for the TdP risk screening.
16th Online World Conference on Soft Computing in Industrial Applications
•rubidium flux•radioactive ligands binding•fluorescence assessment•electrophysiological methods (HEK, CHO, XO cell lines) -
patch clamp
-100
-60
-20
20
60
Inhib
itio
n
(%)
log concentration (µM)
micropipette
cellmembrane
connection50 MΩ
underpressure
connection10 - 100 GΩ
proarrhythmic liability assessmentin vitro
The hERG blocking potency, expressed by IC50 – half maximal inhibitory concentration of tested compund - is most widely used predictor of cardiac liability defined at the earliest stages of drug development is. There are several in vitro approaches used to quantify drug-hERG interactions including rubidium-flux and radioligand binding assays, in vitro electrophysiology measurements and fluorescence-based assays. Among these methods, electrophysiological assays, especially the manual PatchClamp technique in the whole cell mode, are considered as a ‘gold standard’. Patch Clamp method is based on the application of voltage stimuli to cells and measurement of the direct current from ion channels within the membrane patch. The amplitude of tail current is measured in absence and presence of different concentrations of the tested compound in bath solution. Once the data are collected, the concentration producing half-maximal block of the hERG potassium current (IC50) can be calculated by fitting the data with Hill equation and obtaining a concentration-response curve. This technique is labor-intensive and low-throughput but it provides accurate in-depth information on the effect of drugs on ion channels.
Compound IC50 channel' model' source temp' K+ [mM] bath solution t1 puls [s] holding pot depol puls to mV pomiar
Amiodarone 3.84 4 4 Kamiya 2001a 2 5.4 0.2 -50 30 -50Amiodarone 1.2 4 4 Zankov 2005 2 5.4 2 -50 30 -50Amiodarone 1.74 4 4 Zankov 2005 2 5.4 0.5 -50 30 -50AZD7009 193 1 2 Persson 2005 1 5.4 6 -80 40 40Azimilide 1.4 1 2 Busch1998 1 5.4 6 -80 40 40Azimilide 2.6 3 1 Busch1998 1 2 15 -80 -10 -10Azimilide 3.1 3 1 Persson 2005 1 10 15 -80 -10 -10Bepridil 6.2 4 4 Wang 1999a 2 5.4 1 -40 60 -40Canrenoic_acid 22.3 1 2 Gomez 2005 1 4 2 -80 40 -40
Chromanol_293B 26.9 2 1 Loussouarn 1997
1 2 3 -80 40 40
Chromanol_293B 6.9 1 1 Lerche 2007 1 2 12 -80 40 40
16th Online World Conference on Soft Computing in Industrial Applications
proarrhythmic liability in silico
Compound IC50 [µM]
model source transf temp [°C]
tech K+ [mM]
t1;t2 pulse
[s]
hold [mV]
depol [mV]
measure
[mV]protocol
4-aminopyridine 4400 HEK Ridley 2003 stably 37 whole cell PC 4 1;1.5 -40 30 -40 stepAcehytisine 465.95 HEK Huang2008 stably 23 whole cell PC 4 4;4 -80 20 20 step
Ajmaline1.04
HEKKiesecker
2004 stably 21 whole cell PC 50.4;0.
4 -80 40 -120 step
Ajmaline42.3
XOKiesecker
2004 21voltage-clamp 2-
electrode 50.4;0.
4 -80 40 -60 stepAmbasilide 16.1 CHO Walker 2000 stably 22 whole cell PC 4.8 3.9;5 -80 -30 -60 stepAmbasilide 3.6 CHO Walker 2000 stably 22 whole cell PC 4.8 3.9;5 -80 30 -60 stepAmiodarone 0.048 CHO Guo 2005 stably 35 whole cell PC 4 1;0.5 -80 20 -40 step
Amiodarone37.9
XOKamiya 2001a 23
voltage-clamp 2-electrode 5.4 2;0 -90 40 -70 step
Amiodarone9.8
XO Kiehn 1999 20voltage-clamp 2-
electrode 50.4;0.
4 -80 30 -60 step
Amitriptyline4.78
XO Jo 2000 21.5voltage-clamp 2-
electrode 40.5;0.
5 -70 20 -60 step
Amitriptyline4.66
XO Jo 2000 21.5voltage-clamp 2-
electrode 2 4;4 -70 30 -60 step
Amitriptyline10
CHOTie 2000/Tie
2002 stably 21 whole cell PC 4.8 3.9;5 -80 30 -60 step
The first step of the work included bibliographic analysis of the available literature sources and the collection of quantitative data describing the hERG channel blocking phenomenon. As a result, the data set of experimentally measured IC50 values (the concentration which produces half-maximal block of the hERG channel) with the relevant information regarding factors influencing the laboratory measurement results (electrophysiological settings) was created. Every single paper and data record was carefully checked to meet the inclusion criteria to assure the high quality of the data.
> 300447
175
3transf – transfection typeK+ - potassium ions concentration in bath solutiont1 – duration of depolarization pulse
hold – holding potentialdepol – depolarization levelmeasure – measurement voltage
data base
The final data set relates to over 300 publically available papers.
The final data set consisted of 447 IC50 values (half-maximal inhibitory concentration) for 175 different compounds accompanied by a description of the in vitro experimental settings.
In all experiments three models were used (HEK, CHO, XO).
The database is freely available after registration on the Tox-Comp project website www.tox-portal.net
16th Online World Conference on Soft Computing in Industrial Applications
Temperature extrapolation factor
HEK physiological temperature/HEK room temperature
Inter-system extrapolation factor
Inter-system ratios
HEK
XO
HEK CHO
phys
phys
room
proarrhythmic liability in silicoextrapolation factors
As literature data analysis shows that the hERG interactions experiments carried out in different conditions with use of different in vitro systems for the same substance can result with different IC50 values, we proposed (Wiśniowska, Polak 2009, ToxicolMechMeth) extrapolation factors for inter-system (HEK, CHO, XO) and inter-temperature (room and physiological) IC50 values unification.
The original IC50 values derived from in vitro experiments were scaled with the use of extrapolating factors.
16th Online World Conference on Soft Computing in Industrial Applications
methods & algorithms
proarrhythmic liability in silico
• OUTPUT: binary (unsafe/safe) - classification
• INPUT: phys-chem descriptors + in vitro experimental settings
• ALGORITHMS:
ANN
Decision trees
BayesNet
• VALIDATION: internal (10-fold CV) & external (dataset)
The task, for the model to solve, was to classify compounds according to their cardiac safety. Therefore a flag, “safe” or “unsafe”, (encoded as 0 and 1, respectively), was assigned to each data record. The safety threshold was set at IC50 equal to 1 µM, on the basis of literature analysis. The selected cut-off value for hERG-blokers/non-blokers resulted in 250 records classified as “unsafe” and 197 as “safe”. The subsequent model development process included the selection of the input vector components and a classification algorithm. In addition to the parameters defining laboratory setting, over 100 descriptors of the physico-chemical properties were generated for each compound, using ChemAxon software (Marvin Beans).
• INPUT: in vitro experimental settings + phys-chem descriptors
derived from the available literature
- cell model - XO / CHO / HEK - temperature – room/phys- K+ bath concentration [mM]- t1 pulse [s]- t2 pulse [s]- holding potential [mV]- depolarization level [mV]- measurement potential [mV]
16th Online World Conference on Soft Computing in Industrial Applications
proarrhythmic liability in silicomethods & algorithms
16th Online World Conference on Soft Computing in Industrial Applications
proarrhythmic liability in silico
• INPUT: in vitro experimental settings + phys-chem descriptors
calculated in Marvin Beans package
- sdf files either derived from PubChem or drawn in MarvinSketch
- 41 plugins- 107 numeric inputs natively- 38 parameters after the
sensitivity analysis
methods & algorithms
Sensitivity analysis was performed in order to reduce model complexity and to evaluate the impact of the reduction number of descriptors on the model’s generalization ability. 38 key variables were identified. The chosen descriptors include both those describing experimental settings, as well as the physico-chemical and geometric properties indicated in literature as crucial for drug-hERG channel binding
• INPUT: in vitro experimental settings + phys-chem descriptors
calculated in Marvin Beans package
- Log P- Log D- …- rotatable bound count- chiral center count- stereoisomer count- …- Largest_ring_size- Minimal_projection_radius/
area- Maximal_projection_radius/
area16th Online World Conference on Soft Computing in Industrial Applications
proarrhythmic potency assessmentmethods & algorithms
16th Online World Conference on Soft Computing in Industrial Applications
methods & algorithms
proarrhythmic liability in silico
• OUTPUT: binary (unsafe/safe) - classification
• INPUT: phys-chem descriptors + in vitro experimental settings
• ALGORITHMS:
ANN
Decision trees
BayesNet
• VALIDATION: internal (10-fold CV) & external (dataset)
Artificial neural networks and recursive partitioning methods were used for predictive model development. The specially written ANN simulator, the Nets2010, and WEKA software (Waikato Environment for Knowledge Analysis) were used to perform the model development. In this work classical multi-layer perceptrons, containing 1 to 6 hidden layers, were applied. All ANNs were trained by a back propagation algorithm with momentum and jog-of-weights modifications. The amount of training iteration varied from 10 000 to 5 000 000. The epoch size was set to 1. The training of ANNs involved random data record presentation with or without additional noise in the data (‘rand’ or ‘orig’ in the input data files description, respectively). The data was scaled linearly from 0.2 to 0.8 or from −0.8 to 0.8 (‘scale’ in the input data files description). Decision tress and random forests were developed with the use of the WEKA package. Different tree constructing algorithms with learning parameter modifications were tested. In addition to the single-algorithm-based models, modular systems (so called ‘expert committees’) combining several ANNs or an ANN with WEKA algorithms were tested.
In order to ensure the highest possible reliability of the final model three various validation modes were applied to the model performance assessment: standard 10-fold cross validation procedure (10-fold CV), enhanced 10-fold cross validation and a validation on an external test set of 55 records for both previously present (different in vitro models) and absent in native dataset structures. Modification of the standard 10-CV procedure was proposed. All information describing a particular drug was excluded from the test sets. It assured that the training and test sets were separated as each compound can belong exclusively to either training or test instances.
16th Online World Conference on Soft Computing in Industrial Applications
results – 10 CV
proarrhythmic liability in silico
Architecture Iterations All% SE SP PPV NPV
15_7_5_tanh_rand_scale 1 000 000 78.91 0.84 0.73 0.78 0.80
3_2_tanh_orig_scale 1 000 000 78.30 0.79 0.77 0.80 0.77
5_3_2_tanh_rand_scale 500 000 78.26 0.79 0.77 0.80 0.77
15_7_5_fsr_rand_scale 50 000 78.07 0.80 0.76 0.79 0.77
5_3_2_sigma_rand 200 000 78.07 0.81 0.75 0.79 0.78
…
RandomForest_I10_K0_S1_orig 75.96 0.74 0.79 0.79 0.73
All – overall classification rate; SE – sensitivity; SP – specificity; PPV – positive predictive value; NPV – negative predictive value
Around 800 classification models were developed and evaluated. The table presents the results for the enhanced 10-CV procedure of 5 best obtained models. The optimal model found during the numerical experimentation was based on the artificial neural network algorithm and composed of three hidden layers and a hyperbolic tangent activation function. The performance estimated in the enhanced 10-fold cross-validation procedure was 79% of the total correct classifications.
16th Online World Conference on Soft Computing in Industrial Applications
results – external validation
proarrhythmic liability in silico
Architecture Iterations All% SE SP PPV NPV
15_7_5_tanh_rand_scale 1 000 00087 (48/55) 0.93 0.81 0.84 0.91
RandomForest_I10_K0_S1_orig 87 (48/55) 0.90 0.85 0.87 0.88
3_2_tanh_orig_scale 1 000 00086 (47/55) 0.90 0.81 0.84 0.88
5_3_2_tanh_rand_scale 500 000 82 (45/55) 0.86 0.77 0.81 0.83
5_3_2_sigma_rand 200 000 53 (29/55) 1.00 0.00 0.53 -
All – overall classification rate; SE – sensitivity; SP – specificity; PPV – positive predictive value; NPV – negative predictive value
The best obtained architectures with identical settings kept, were then trained again with the use of the whole dataset as the learning set (447 records) and their performance was automatically verified, based on the external dataset. The overall classification rate in this procedure for the best algorithm was 87%
16th Online World Conference on Soft Computing in Industrial Applications
results – modular system
proarrhythmic liability in silico
Expert committees were prepared and tested with the purpose of further model quality improvement. The final classification model for hERG channel blockers was based on 2 and 3 neural networks with the highest prediction accuracy for hERG active and non-active compounds, respectively and the best WEKA algorithm.
16th Online World Conference on Soft Computing in Industrial Applications
results – modular system
proarrhythmic liability in silico
All% SE SP PPV NPV
10-CV 82(45/55) 0.82 0.82 0.83 0.81
External validation 87 (48/55) 0.97 0.77 0.82 0.95
All – overall classification rate; SE – sensitivity; SP – specificity; PPV – positive predictive value; NPV – negative predictive value
All% SE SP PPV NPV
KNOWN 100 (25/25) 1.00 1.00 1.00 1.00
NEW 77 (23/30) 0.91 0.68 0.63 0.93
- generalization ability
- generalization ability – in vitro settings & chemical structures
Overall classification accuracy for the expert committee, estimated in 10-CV procedure was 82% and 87% when tested on the external test set. This prove that combining different algorithms can have beneficial effects on prediction accuracy. However, no substantial improvement was observed. Nevertheless, the negative predictive value of the expert committee is considerably better than NPV of the best neural network. The high NPV (0.93) is beneficial, as the intended model function is the elimination of potentially unsafe compounds as early as possible.
External validation set was composed of both new structures (subset “NEW”- 30 records) and molecules known to the system but assessed under different experimental conditions (subset “KNOWN” – 25 records). The analysis of incorrect decisions provided by the model shows that all of them concerned molecules absent in native dataset. This proves very good predictive value of the model as far as the experimental settings are concerned. Extrapolation is less pronounced in case of chemical structure of the compounds. It should be noted that six of the seven misclassifications are false-positives which means that safe compound was classified as hERG-blocker. This is probably better situation than false-negative predictions, because the risk of proceeding of the cardiotoxic compound and failure in later stages due to cardiac liability is low.
Tox-Comp platform
16th Online World Conference on Soft Computing in Industrial Applications
proarrhythmic liability in silico
The developed binary classification model for hERG inhibition is an element of the Tox-Comp.net platform.
The Tox-Comp.net platform is flexible, modular system for the early assessment of the cardiotoxic potency of the chemical entities.
It is freely available after registration from theCompTox project website.
http://www.tox-portal.net
acknowledgements
team
Sebastian Polak PhDMiłosz PolakKamil FijorekAnna GlinkaMałgorzata Kozłowska
project financed by the Polish National Center for Research and Development LIDER project number
LIDER/02/187/L-1/0916th Online World Conference on Soft Computing in Industrial Applications