using ai to extend qsar models - society of …...using ai to extend qsar models chaoyang (joe)...

Post on 25-Jun-2020

21 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using AI to Extend QSAR Models

Chaoyang (Joe) Zhang

School of Computing Sciences and Computer EngineeringUniversity of Southern Mississippi

Chaoyang.Zhang@usm.edu

Conflict of Interest Statement

I have no perceived conflicts of interest with the research described in this presentation.

Outline/Objectives

To introduce SAR-based chemical toxicity prediction To develop machine learning approaches for QSAR modeling. To extend QSAR models using deep learning To address its challenges and identify the future efforts for predictive

toxicity analysis

In vitro: studies on cell lines

In vivo: studies on animal subjects

In silico: computational experiments

Challenges• Ethical (inhumane) • Economic (time

consuming and expensive)

• Use AI approaches for Structure Activity Relationship (SAR) modeling

Toxicology Study

Chemicals disrupt normal cell functions by binding and altering:

ProteinsDNALipidsOr react with oxygen to form free radicals which can damage cells

Molecular Descriptors

0 1 1 0 01 0 0 1 11 1 0 0 00 1 0 1 11 0 0 0 1

1 0 1 0 1

f

Prediction

ActiveInactive

SAR-Based Predictive Toxicology

Supervised (inductive) learning Given: training data + desired outputs (labels)

Unsupervised learning Given: training data (without desired outputs)

Semi-supervised learning Given: training data + a few desired outputs

Reinforcement learning Rewards from sequence of actionswhat actions should an agent take in a particular situation

Types of Machine Learning

Supervised Learning Framework and Steps Gather a training set

– Data type, size, characteristics Determine the input feature

representation– Curse of dimensionality – Feature engineering (selection and

extraction) Choose learning algorithms

– RF, SVM, Bayesian, Deep Learning Complete the design

– Optimization, cross-validation Evaluate the accuracy

– Which evaluation metrics?

Tox21 Data Challenges

The Toxicology in the 21st Century (Tox21) program, a federal collaboration involving NIH, EPA, and FDA– The goal of the challenge is to "crowdsource" data analysis by independent

researchers to reveal how well they can predict compounds' interference in biochemical pathways using only chemical structure data.

To determine which environmental chemicals and drugs are of the greatest potential concern to human health.

https://tripod.nih.gov/tox21/challenge/

Data Statistics–Highly Imbalanced Data

Imbalance ratio (IR), refers to the ratio of the number of instances in the majority class to the number of instances in the minority class

Machine Learning Approaches and Workflow

Imbalance Handling Techniques

• Imbalance Handling Techniques• Random undersampling• Synthetic minority over-

sampling technique (SMOTE) • SMOTEENN (i.e., a

combination of SMOTE and Edited Nearest Neighbor (ENN) algorithms)

Classification Methods

Four Classification Models• RF: RF without imbalance handling • RUS: RF with random undersampling• SMO: RF with SMOTE • SMN: RF with SMOTEENN

Random forest (RF) is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees.

Model Evaluation Metrics

Precision = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇

Recall = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹

F1-score = 2 * 𝑇𝑇𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃∗𝑅𝑅𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑇𝑇𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃+𝑅𝑅𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅

Specificity = 𝑇𝑇𝐹𝐹𝑇𝑇𝐹𝐹+𝐹𝐹𝑇𝑇

Balanced Accuracy (BA) = 𝑅𝑅𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅+𝑆𝑆𝑆𝑆𝑃𝑃𝑃𝑃𝑃𝑃𝑆𝑆𝑃𝑃𝑃𝑃𝑃𝑃𝑆𝑆𝑆𝑆2

MCC = 𝑇𝑇𝑇𝑇∗𝑇𝑇𝐹𝐹 – 𝐹𝐹𝑇𝑇∗𝐹𝐹𝐹𝐹(𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇)(𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹)(𝑇𝑇𝐹𝐹+𝐹𝐹𝑇𝑇)(𝑇𝑇𝐹𝐹+𝐹𝐹𝐹𝐹)

AUROC: Area under the ROC curve(receiver operating characteristic)

AUPRC: Area under Precision-Recall curve

We are more interested in the minority class of active compounds.

AUROC is not good for performance evaluation of imbalanced classification problem!

Performance Comparison

Average Friedman ranks for the four classification methods based on F1_score, AUPRC, AUROC, MCC or BA metrics

p-values for multiple and pair-wise comparisons

F1 score, MCC and Brier score are more sensitive and consistent metrics

Comparison with Tox21 Data Challenge Winners

Red-color: the highest among all the classifiers (both this study and Tox21 Data Challenge)

Bold font: the best among the Tox21 Challenge participating teams.

Impact of Imbalance Ratio (IR)

There exists a strong negative correlation between the prediction accuracy and the imbalance ratio (IR)

Summary–Handling Imbalance Data in QSAR

• The performance of SAR-based, imbalanced chemicaltoxicity classification can be significantly improved throughimbalance handling.

• There exists a strong negative correlation between theprediction accuracy and the imbalance ratio (IR). Allmethods became less effective when IR exceeded a certainthreshold (e.g., >40).

• F1 score, MCC and Brier score are sensitive metrics andare better for performance evaluation than other metrics.

Deep Learning for Toxicity Prediction Mayr et al. (2016) DeepTox: Toxicity Prediction using

Deep Learning. Front. Environ. Sci. – DeepTox pipeline won 7 out of 12 sub-challenges (12 bioassays)

Liu, R. et al. (2018) Assessing Deep and Shallow Learning Methods for Quantitative Prediction of Acute Chemical Toxicity. Toxicol. Sci. 164, 512–526.

Gabriel Idakwo et al. (2019) Deep Learning-Based Structure-Activity Relationship Modeling for Multi-Category Toxicity Classification: A Case Study of 10K Tox21 Chemicals with High-Throughput Cell-Based Androgen Receptor Bioassay Data," Front. Physiol.

Four assay outcomes:• Agonist• Antagonist• Inactive• Inconclusive

A data frame with: • 7665 unique

compounds• 2544 features • 4 classes

Combine agonistic and antagonistic Androgen Receptor (AR) assays

Data Curation and Preprocessing

Deep Learning Model

Four class labelsReLU activation function Auto feature engineering

Bayesian Hyperparameter Optimization in DL

Implemented in Hyperas, a tool that combines the Keras DL library

The search space included – hidden layers {2,3,4} – neurons {32,64,128,256,512,1024}– optimization methods {mini-batch

gradient descent, Adam, RMSprop, Adagrad}

– batch size {8,16,32,64,128} – learning rate {random uniform

distribution between 0 and 1}

DL Hyperparameters– Number of layers– Number of neurons– Learning rate– Epoch (number of iterations)– Choice of activation function

Sigmoid function Rectified Linear Unit (ReLU)

– Dropout parameter– Batch size

Workflow

Overview of the machine learning-based SAR approach with a nested double-loop cross-validation strategy for model construction, validation and evaluation

Algorithm Comparison (Default)

KNN, k-nearest neighbors RF, random forest CART, classification and regression trees NB, Naïve Bayes SVM, support vector machine; DNN, deep neural network

Prediction Results of Optimized RF

Macro-averages of five evaluation metrics derived using random forest.

Parameters Initial distribution

Optimized

Max_depth 2,3, None None

Criterion "gini", "entropy“ gini

Min_samples_leaf

0.5, 1, 5, 10, 20,25

10

N_estimators 50, 100, 200, 300,400

200

Max_features "auto", "log2", None, 0.8, 0.5,

0.2,0.1

0.2

RF parameter optimization

Comparison of DL and RF

The DL model with a macro-average F1-score of 0.83 was shown to perform better than RF with 0.56.

Comparison of RF and DL: Confusion Matrix

RF DL

Summary–Extend QSAR Modeling Using DL

Deep learning has great potential to significantly improve the accuracy of in-silicon predictive toxicology. – The hyperparameters in the deep learning must be optimized.

The DL model with a macro-average f-measure of 0.83 was shown to perform better than RF with 0.56.

Both DL and RF algorithms had difficulty predicting the antagonist outcome correctly, but DL did better

Discussion and Future Efforts Large benchmark dataset and quality Improve quantitative toxicity prediction using novel

descriptors derived from molecular dynamics simulation, docking and other information

Feature engineering (feature selection and extraction)– Autoencoder

Develop multi-model deep learning framework for in silico predictive toxicology

Ensemble methods

References Mayr A, Klambauer G, Unterthiner T, et al (2016) DeepTox: Toxicity Prediction

using Deep Learning. Front Environ Sci 3:1–15. Huang R, Xia M, Nguyen D-T, et al (2016) Tox21Challenge to Build Predictive

Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs. Front Environ Sci 3:85.

Idakwo G, Thangapandian S, Luttrell J, et al (2019) Deep Learning-Based Structure-Activity Relationship Modeling for Multi-Category Toxicity Classification: A Case Study of 10K Tox21 Chemicals With High-Throughput Cell-Based Androgen Receptor Bioassay Data. Front Physiology 10:1044.

Mayr et al. (2018) conducted a large-scale comparison of drug target prediction. Chemical Science

Liu, R. et al. (2018) Assessing Deep and Shallow Learning Methods for Quantitative Prediction of Acute Chemical Toxicity. Toxicol. Sci. 164, 512–526.

Acknowledgements– Dr. Ping Gong & Dr. Sundar Thangapandian (ERDC)

Environmental LaboratoryU.S. Army Engineer Research and Development Center

– Dr. Huixiao Hong (NCTR)Division of Bioinformatics and BiostatisticsNational Center for Toxicological ResearchU.S. Food and Drug Administration

– Gabriel Idakwo & Joseph Luttrell (PhD students at USM)School of Computing Sciences and Computer EngineeringUniversity of Southern Mississippi

top related