structural and functional interpretation of qsar...

24
Structural and functional interpretation of QSAR models P.G. Polishchuk 1 , T.M. Khristova 1,2 , L.N. Ognichenko 1 , A.P. Kosinskaya 1 , A.A. Varnek 2 , V.E. Kuz’min 1 1 A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Odessa, Ukraine 2 Strasbourg University, Strasbourg, France 31 August 4 September, 2014 St.Petersburg, Russia

Upload: others

Post on 20-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Structural and functional interpretation of QSAR models

    P.G. Polishchuk1, T.M. Khristova1,2, L.N. Ognichenko1,

    A.P. Kosinskaya1, A.A. Varnek2, V.E. Kuz’min1

    1 A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Odessa, Ukraine2 Strasbourg University, Strasbourg, France

    31 August – 4 September, 2014St.Petersburg, Russia

  • QSAR interpretation: interpretability vs. complexity

    2

    Mo

    del

    inte

    rpre

    tab

    ility

    Model complexity

    Popular misbelief

    DTMLR

    PLS

    ANN

    kNNRF

    SVM

    models ensembles

  • QSAR interpretation approaches

    Model-specific approaches:

    Rule-based (Decision tree)

    Regression coefficients (MLR, PLS)

    Latent variables (PLS)

    Weights and biases (ANN)

    Model-independent approaches:

    Local gradients or partial derivatives

    3i

    iiii

    x

    )xf(x)f(x

    ΔC

  • Model

    QSAR interpretation: common workflow

    Variables contributions

    Structure-property

    relationship

    f(x)Var_1 Var_2

    Mol_1 -0.23 1.82

    Mol_2 2.36 1.27

    Mol_3 5.01 2.30

    Mol_4 0.69 -0.58

    4

  • Matched molecular pairs & molecular transformations

    logS = -3.18 logS = -0.60

    H → OH

    ΔlogS = 2.58

    logS = -2.21 logS = -0.62

    ΔlogS = 1.59

    5Leech A.G. et al, J. Med. Chem. 2006, 49, 6672-6682Sheridan R.P. et al., J. Chem. Inf. Model. 2006, 46, 180-192

  • Exemplified dataset

    6

  • Structural QSAR interpretation

    - =

    logSpred = -1.55 logSpred = -1.61 ΔlogSpred = 0.06

    - =

    logSpred = -1.55 logSpred = -1.35 ΔlogSpred = -0.207

    Polishchuk P.G. et al. Molecular Informatics 2013,32, 843-853

  • Structural QSAR interpretation

    logSpred = -1.93 logSpred = -4.32 ΔlogSpred = 2.39

    - =

    8Polishchuk P.G. et al. Molecular Informatics 2013,32, 843-853

  • Atom-property labeling

    Simplex generation example

    Kuz’min, V. E. et al, Journal of Molecular Modelling 2005, 11, 457-467.Kuz’min, V. E. et al, Journal of Computer-Aided Molecular Design 2008, 22, 403-421.

    Simplex representation of molecular structure (SiRMS)

    9

  • Examples of structural interpretation

    10

  • Solubility (1033 compounds)

    11

    Endpoint ModelSiRMS Dragon

    R2CV RMSE R2

    CV RMSE

    Solubility,logS

    PLS 0.84 0.82 0.91 0.60RF 0.88 0.71 0.91 0.62

    SVM 0.87 0.72 0.92 0.59

    5-fold external cross validation results

  • Mutagenicity (Ames, 4361 compounds)

    12

    Descriptors Algorithm Balanced Accuracy

    SiRMSRF 0.817

    SVM 0.800

    DragonRF 0.816

    SVM 0.793

    5-fold external cross validation results

  • Combined contribution (effect) of fragments

    13

    RF+SiRMS

    Contribution = 0(non-mutagen)

    Contribution = 0(non-mutagen)

    Contribution = 1(mutagen)

  • Questions

    14

    How does the fragmentinfluence the property?

    WHY?!

  • pIC50 = f(A1, A2, A3) = x pIC50 = f(B1, B2, B3) = y

    Functional interpretation of QSAR models

    15

    1, 2, 3 – groups of descriptors represented different physico-chemical factors(charge, H-bonding, etc) of compound A and B.

    A B

    - =

    pIC50 = f(A) = x pIC50 = f(B) = y Contribution(C) = x - y

    Structural interpretation

    C

    pIC50 = f(A1, A2, A3) = x pIC50 = f(A1, A2, B3) = y Contribution3(C) = x - y

    Functional interpretation

    A B

    - =C

  • Antagonists of fibrinogen receptor

    16

  • Arg-mimetic Linker Asp-mimetic

    Frag

    men

    t ex

    amp

    les

    Antagonists of fibrinogen receptor: dataset

    17

    Arg-Gly-Asp

    338 compounds

  • Antagonists of fibrinogen receptor: models

    18

    Algorithm R2 RMSE

    RF 0.72 0.80

    SVM (RBF kernel) 0.69 0.84

    SVM (linear) 0.67 0.88

    PLS 0.67 0.87

    5-fold external cross validation results

  • Structural interpretation

    19

    Arg

    -mim

    eti

    cLi

    nke

    rA

    sp-m

    ime

    tic

  • Functional interpretation

    20

    Arg

    -mim

    eti

    cLi

    nke

    rA

    sp-m

    ime

    tic

  • Functional interpretation of RF model

    21

    Arg214

    Mg2+

    Asp224

    Ser225

    0.070.74

    0.341.77

    0.051.03

    0.420.6

    0.051.09

    0.461.02

    0.040.29

    0.140.81

    electrostaticH-bondinghydrophobicitypolarizability

    Phe160Tyr190

    RF model

  • Functional interpretation of SVM model

    22

    Mg2+

    0.210.71

    0.220.72

    0.080.51

    0.160.56

    0.110.46

    0.220.81

    0.10.130.14

    0.39

    electrostaticH-bondinghydrophobicitypolarizability

    Arg214

    Asp224

    Ser225

    Phe160Tyr190

    SVM-RBF model

  • Conclusions

    23

    Almost any QSAR model can be interpreted using the proposed schemes.

    Results of structural and functional interpretation obtained from different models are well correlated.

    Structural interpretation allows fragments ranking and finding potential alerts.

    Functional interpretation may provide a guess which factors are dominated and influence on the investigated property.

  • A.V. Bogatsky Physico-Chemical Institute, Chemoinformatic group:http://qsar4u.com

    SiRMS project on GitHub:https://github.com/DrrDom/sirms

    Online structural interpretation tool:http://www.physchem.od.ua/compute

    Related projects

    24

    http://qsar4u.com/https://github.com/DrrDom/sirmshttp://www.physchem.od.ua/compute