Using AI to Extend QSAR Models
Chaoyang (Joe) Zhang
School of Computing Sciences and Computer EngineeringUniversity of Southern Mississippi
Conflict of Interest Statement
I have no perceived conflicts of interest with the research described in this presentation.
Outline/Objectives
To introduce SAR-based chemical toxicity prediction To develop machine learning approaches for QSAR modeling. To extend QSAR models using deep learning To address its challenges and identify the future efforts for predictive
toxicity analysis
In vitro: studies on cell lines
In vivo: studies on animal subjects
In silico: computational experiments
Challenges• Ethical (inhumane) • Economic (time
consuming and expensive)
• Use AI approaches for Structure Activity Relationship (SAR) modeling
Toxicology Study
Chemicals disrupt normal cell functions by binding and altering:
ProteinsDNALipidsOr react with oxygen to form free radicals which can damage cells
Molecular Descriptors
0 1 1 0 01 0 0 1 11 1 0 0 00 1 0 1 11 0 0 0 1
1 0 1 0 1
f
Prediction
ActiveInactive
SAR-Based Predictive Toxicology
Supervised (inductive) learning Given: training data + desired outputs (labels)
Unsupervised learning Given: training data (without desired outputs)
Semi-supervised learning Given: training data + a few desired outputs
Reinforcement learning Rewards from sequence of actionswhat actions should an agent take in a particular situation
Types of Machine Learning
Supervised Learning Framework and Steps Gather a training set
– Data type, size, characteristics Determine the input feature
representation– Curse of dimensionality – Feature engineering (selection and
extraction) Choose learning algorithms
– RF, SVM, Bayesian, Deep Learning Complete the design
– Optimization, cross-validation Evaluate the accuracy
– Which evaluation metrics?
Tox21 Data Challenges
The Toxicology in the 21st Century (Tox21) program, a federal collaboration involving NIH, EPA, and FDA– The goal of the challenge is to "crowdsource" data analysis by independent
researchers to reveal how well they can predict compounds' interference in biochemical pathways using only chemical structure data.
To determine which environmental chemicals and drugs are of the greatest potential concern to human health.
https://tripod.nih.gov/tox21/challenge/
Data Statistics–Highly Imbalanced Data
Imbalance ratio (IR), refers to the ratio of the number of instances in the majority class to the number of instances in the minority class
Machine Learning Approaches and Workflow
Imbalance Handling Techniques
• Imbalance Handling Techniques• Random undersampling• Synthetic minority over-
sampling technique (SMOTE) • SMOTEENN (i.e., a
combination of SMOTE and Edited Nearest Neighbor (ENN) algorithms)
Classification Methods
Four Classification Models• RF: RF without imbalance handling • RUS: RF with random undersampling• SMO: RF with SMOTE • SMN: RF with SMOTEENN
Random forest (RF) is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees.
Model Evaluation Metrics
Precision = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇
Recall = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹
F1-score = 2 * 𝑇𝑇𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃∗𝑅𝑅𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑇𝑇𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃+𝑅𝑅𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅
Specificity = 𝑇𝑇𝐹𝐹𝑇𝑇𝐹𝐹+𝐹𝐹𝑇𝑇
Balanced Accuracy (BA) = 𝑅𝑅𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅+𝑆𝑆𝑆𝑆𝑃𝑃𝑃𝑃𝑃𝑃𝑆𝑆𝑃𝑃𝑃𝑃𝑃𝑃𝑆𝑆𝑆𝑆2
MCC = 𝑇𝑇𝑇𝑇∗𝑇𝑇𝐹𝐹 – 𝐹𝐹𝑇𝑇∗𝐹𝐹𝐹𝐹(𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇)(𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹)(𝑇𝑇𝐹𝐹+𝐹𝐹𝑇𝑇)(𝑇𝑇𝐹𝐹+𝐹𝐹𝐹𝐹)
AUROC: Area under the ROC curve(receiver operating characteristic)
AUPRC: Area under Precision-Recall curve
We are more interested in the minority class of active compounds.
AUROC is not good for performance evaluation of imbalanced classification problem!
Performance Comparison
Average Friedman ranks for the four classification methods based on F1_score, AUPRC, AUROC, MCC or BA metrics
p-values for multiple and pair-wise comparisons
F1 score, MCC and Brier score are more sensitive and consistent metrics
Comparison with Tox21 Data Challenge Winners
Red-color: the highest among all the classifiers (both this study and Tox21 Data Challenge)
Bold font: the best among the Tox21 Challenge participating teams.
Impact of Imbalance Ratio (IR)
There exists a strong negative correlation between the prediction accuracy and the imbalance ratio (IR)
Summary–Handling Imbalance Data in QSAR
• The performance of SAR-based, imbalanced chemicaltoxicity classification can be significantly improved throughimbalance handling.
• There exists a strong negative correlation between theprediction accuracy and the imbalance ratio (IR). Allmethods became less effective when IR exceeded a certainthreshold (e.g., >40).
• F1 score, MCC and Brier score are sensitive metrics andare better for performance evaluation than other metrics.
Deep Learning for Toxicity Prediction Mayr et al. (2016) DeepTox: Toxicity Prediction using
Deep Learning. Front. Environ. Sci. – DeepTox pipeline won 7 out of 12 sub-challenges (12 bioassays)
Liu, R. et al. (2018) Assessing Deep and Shallow Learning Methods for Quantitative Prediction of Acute Chemical Toxicity. Toxicol. Sci. 164, 512–526.
Gabriel Idakwo et al. (2019) Deep Learning-Based Structure-Activity Relationship Modeling for Multi-Category Toxicity Classification: A Case Study of 10K Tox21 Chemicals with High-Throughput Cell-Based Androgen Receptor Bioassay Data," Front. Physiol.
Four assay outcomes:• Agonist• Antagonist• Inactive• Inconclusive
A data frame with: • 7665 unique
compounds• 2544 features • 4 classes
Combine agonistic and antagonistic Androgen Receptor (AR) assays
Data Curation and Preprocessing
Deep Learning Model
Four class labelsReLU activation function Auto feature engineering
Bayesian Hyperparameter Optimization in DL
Implemented in Hyperas, a tool that combines the Keras DL library
The search space included – hidden layers {2,3,4} – neurons {32,64,128,256,512,1024}– optimization methods {mini-batch
gradient descent, Adam, RMSprop, Adagrad}
– batch size {8,16,32,64,128} – learning rate {random uniform
distribution between 0 and 1}
DL Hyperparameters– Number of layers– Number of neurons– Learning rate– Epoch (number of iterations)– Choice of activation function
Sigmoid function Rectified Linear Unit (ReLU)
– Dropout parameter– Batch size
Workflow
Overview of the machine learning-based SAR approach with a nested double-loop cross-validation strategy for model construction, validation and evaluation
Algorithm Comparison (Default)
KNN, k-nearest neighbors RF, random forest CART, classification and regression trees NB, Naïve Bayes SVM, support vector machine; DNN, deep neural network
Prediction Results of Optimized RF
Macro-averages of five evaluation metrics derived using random forest.
Parameters Initial distribution
Optimized
Max_depth 2,3, None None
Criterion "gini", "entropy“ gini
Min_samples_leaf
0.5, 1, 5, 10, 20,25
10
N_estimators 50, 100, 200, 300,400
200
Max_features "auto", "log2", None, 0.8, 0.5,
0.2,0.1
0.2
RF parameter optimization
Comparison of DL and RF
The DL model with a macro-average F1-score of 0.83 was shown to perform better than RF with 0.56.
Comparison of RF and DL: Confusion Matrix
RF DL
Summary–Extend QSAR Modeling Using DL
Deep learning has great potential to significantly improve the accuracy of in-silicon predictive toxicology. – The hyperparameters in the deep learning must be optimized.
The DL model with a macro-average f-measure of 0.83 was shown to perform better than RF with 0.56.
Both DL and RF algorithms had difficulty predicting the antagonist outcome correctly, but DL did better
Discussion and Future Efforts Large benchmark dataset and quality Improve quantitative toxicity prediction using novel
descriptors derived from molecular dynamics simulation, docking and other information
Feature engineering (feature selection and extraction)– Autoencoder
Develop multi-model deep learning framework for in silico predictive toxicology
Ensemble methods
References Mayr A, Klambauer G, Unterthiner T, et al (2016) DeepTox: Toxicity Prediction
using Deep Learning. Front Environ Sci 3:1–15. Huang R, Xia M, Nguyen D-T, et al (2016) Tox21Challenge to Build Predictive
Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs. Front Environ Sci 3:85.
Idakwo G, Thangapandian S, Luttrell J, et al (2019) Deep Learning-Based Structure-Activity Relationship Modeling for Multi-Category Toxicity Classification: A Case Study of 10K Tox21 Chemicals With High-Throughput Cell-Based Androgen Receptor Bioassay Data. Front Physiology 10:1044.
Mayr et al. (2018) conducted a large-scale comparison of drug target prediction. Chemical Science
Liu, R. et al. (2018) Assessing Deep and Shallow Learning Methods for Quantitative Prediction of Acute Chemical Toxicity. Toxicol. Sci. 164, 512–526.
Acknowledgements– Dr. Ping Gong & Dr. Sundar Thangapandian (ERDC)
Environmental LaboratoryU.S. Army Engineer Research and Development Center
– Dr. Huixiao Hong (NCTR)Division of Bioinformatics and BiostatisticsNational Center for Toxicological ResearchU.S. Food and Drug Administration
– Gabriel Idakwo & Joseph Luttrell (PhD students at USM)School of Computing Sciences and Computer EngineeringUniversity of Southern Mississippi