Martin Ott

Download Martin Ott

Post on 10-Feb-2016




0 download

Embed Size (px)


Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA). Martin Ott. Outline. Introduction Structures and activities Regression techniques: PCA, PLS Analysis techniques: Free-Wilson, Hansch - PowerPoint PPT Presentation


  • Bioinformatics IV

    Quantitative Structure-Activity Relationships (QSAR)


    Comparative Molecular Field Analysis (CoMFA)Martin Ott

  • OutlineIntroductionStructures and activities Regression techniques: PCA, PLSAnalysis techniques: Free-Wilson, HanschComparative Molecular Field Analysis

  • QSAR: The Setting Quantitative structure-activity relationships are usedwhen there is little or no receptor information, but there are measured activities of (many) compounds

    They are also useful to supplement docking studies which take much more CPU time

  • From Structure to Property EC50

  • From Structure to Property LD50

  • From Structure to Property

  • QSAR: Which Relationship? Quantitative structure-activity relationships correlate chemical/biological activities with structural features or atomic, group or molecular properties

    within a range of structurally similar compounds

  • Free Energy of BindingDGbinding = DG0 + DGhb + DGionic + DGlipo + DGrot

    DG0 entropy loss (translat. + rotat.) +5.4DGhb ideal hydrogen bond 4.7DGionic ideal ionic interaction 8.3DGlipo lipophilic contact 0.17DGrot entropy loss (rotat. bonds) +1.4 (Energies in kJ/mol per unit feature)

  • Free Energy of Binding andEquilibrium ConstantsThe free energy of binding is related to the reaction constants of ligand-receptor complex formation:DGbinding = 2.303 RT log K= 2.303 RT log (kon / koff)

    Equilibrium constant KRate constants kon (association) and koff (dissociation)

  • Concentration as Activity MeasureA critical molar concentration C that produces the biological effect is related to the equilibrium constant KUsually log (1/C) is used (c.f. pH)For meaningful QSARs, activities need to be spread out over at least 3 log units

  • Molecules Are Not Numbers! Where are the numbers? Numerical descriptors

  • An Example: Capsaicin Analogs

    XEC50(mM) log(1/EC50)H11.804.93Cl 1.245.91NO2 4.585.34CN26.504.58C6H5 0.246.62NMe2 4.395.36I 0.356.46NHCHO??

  • An Example: Capsaicin AnalogsMR = molar refractivity (polarizability) parameter; p = hydrophobicity parameter; s = electronic sigma constant (para position); Es = Taft size parameter

    Xlog(1/EC50)MRpsEsH4.93 1.03 0.00 0.00 0.00Cl5.91 6.03 0.71 0.23-0.97NO25.34 7.36-0.28 0.78-2.52CN4.58 6.33-0.57 0.66-0.51C6H56.6225.36 1.96-0.01-3.82NMe25.3615.55 0.18-0.83-2.90I6.4613.94 1.12 0.18-1.40NHCHO?10.31-0.98 0.00-0.98

  • An Example: Capsaicin Analogslog(1/EC50) = -0.89 + 0.019 * MR + 0.23 * p + -0.31 * s + -0.14 * Es

  • Basic Assumption in QSAR The structural properties of a compound contribute in a linearly additive way to its biological activity provided there are no non-linear dependencies of transport or binding on some properties

  • Molecular Descriptors Simple counts of features, e.g. of atoms, rings, H-bond donors, molecular weightPhysicochemical properties, e.g. polarisability, hydrophobicity (logP), water-solubilityGroup properties, e.g. Hammett and Taft constants, volume2D Fingerprints based on fragments3D Screens based on fragments

  • 2D Fingerprints

    CNOPSXFCl BrIPhCONHOHMeEtPyCHOSOC=CCCC=NAmIm111001001011111000010010

  • Principal Component Analysis (PCA)Many (>3) variables to describe objects = high dimensionality of descriptor dataPCA is used to reduce dimensionalityPCA extracts the most important factors (principal components or PCs) from the dataUseful when correlations exist between descriptorsThe result is a new, small set of variables (PCs) which explain most of the data variation

  • PCA From 2D to 1D

  • PCA From 3D to 3D-

  • Different Views on PCAStatistically, PCA is a multivariate analysis technique closely related to eigenvector analysisIn matrix terms, PCA is a decomposition of matrix X into two smaller matrices plus a set of residuals: X = TPT + RGeometrically, PCA is a projection technique in which X is projected onto a subspace of reduced dimensions

  • Partial Least Squares (PLS)y1 = a0 + a1x11 + a2x12 + a3x13 + + e1 y2 = a0 + a1x21 + a2x22 + a3x23 + + e2 y3 = a0 + a1x31 + a2x32 + a3x33 + + e3 yn = a0 + a1xn1 + a2xn2 + a3xn3 + + en

    Y = XA + E(compound 1)(compound 2)(compound 3)(compound n)

    X = independent variablesY = dependent variables

  • PLS Cross-validation Squared correlation coefficient R2 Value between 0 and 1 (> 0.9) Indicating explanative power of regression equation

    Squared correlation coefficient Q2 Value between 0 and 1 (> 0.5) Indicating predictive power of regression equation

    With cross-validation:

  • Free-Wilson Analysislog (1/C) = S aixi + m xi:presence of group i (0 or 1) ai: activity group contribution of group i m: activity value of unsubstituted compound

  • Free-Wilson AnalysisComputationally straightforwardPredictions only for substituents already includedRequires large number of compounds

  • Hansch AnalysisDrug transport and binding affinitydepend nonlinearly on lipophilicity:

    log (1/C) = a (log P)2 + b log P + c Ss + k

    P: n-octanol/water partition coefficients: Hammett electronic parametera,b,c:regression coefficientsk:constant term

  • Hansch AnalysisFewer regression coefficients needed for correlationInterpretation in physicochemical termsPredictions for other substituents possible

  • PharmacophoreSet of structural features in a drug molecule recognized by a receptorSample features: H-bond donor charge hydrophobic centerDistances, 3D relationship

  • Pharmacophore SelectionL = lipophilic site; A = H-bond acceptor;D = H-bond donor; PD = protonated H-bond donorDopaminePharmacophore

  • Pharmacophore SelectionL = lipophilic site; A = H-bond acceptor;D = H-bond donor; PD = protonated H-bond donorDopaminePharmacophore

  • Comparative Molecular Field Analysis (CoMFA)Set of chemically related compoundsCommon pharmacophore or substructure required3D structures needed (e.g., Corina-generated)Flexible molecules are folded into pharmacophore constraints and aligned

  • CoMFA Alignment

  • CoMFA Grid and Field Probe(Only one molecule shown for clarity)

  • Electrostatic Potential Contour Lines

  • CoMFA Model DerivationVan der Waals field(probe is neutral carbon)Evdw = S (Airij-12 - Birij-6)Electrostatic field(probe is charged atom)Ec = S qiqj / Drij Molecules are positioned in a regular grid according to alignmentProbes are used to determine the molecular field:

  • 3D Contour Map for Electronegativity

  • CoMFA Pros and ConsSuitable to describe receptor-ligand interactions3D visualization of important featuresGood correlation within related setPredictive power within scanned spaceAlignment is often difficultTraining required