the national technical university of athens qsar group – overview of research activities athens,...

26
The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

Upload: madeline-gardner

Post on 25-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

The National Technical University of Athens QSAR Group – Overview of Research Activities

ATHENS, August 2008

Page 2: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

Structure

The NTUA group emerged out of the collaboration between two research laboratorieswhich are located in the School of Chemical Engineering at NTUA: the Laboratory of Process Control and Informatics and The Laboratory of Organic Chemistry

NTUA QSAR Group – Structure

It is headed by Haralambos Sarimveis, Asst. Professor in Process Control and Informaticsand involves one additional faculty member, one post-doctorate associate, one research associate at Ph.D. level, one software developer and several postgraduate and undergraduate students

The collaboration between the two laboratories started in 2002, recognizing the fact that progress in the design of new molecules with improved properties can be acceleratedby the application of existing quantitative methodologies and the development of new methods that are based on information sciences, computer technologies andcomputational intelligence.

Page 3: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

Activities and Objectives

NTUA QSAR Group – Activities and Objectives

Although the group has been formed quite recently, it has already published numerous papersin top scientific journal, established collaborations with other research groups (Fleming Research Institute, University of Athens, University of Cyprus, Universita degli Studi di Firenze, University of North Carolina, NovaMechanics Ltd) and participated in several research programs.

The group has worked in many scientific disciplines (fuels, polymers, food Properties), but it has focused on the very challenging and important pharmaceutical industry, by developing QSAR models that predict activities and toxicity of existing and new potential pharmaceutical compounds.

Supported by its parallel research activities on simulation of biological and toxicologicalsystems, development of ADMET and physiologically based parhmacokinetic (PBPK) models and automation of drug delivery systems, the objective of NTUA research work is tosupport the different phases in the drug discovery process, from hit finding through leadOptimization. The vision of the group is to contribute to the development of a highly-automated system that will optimize the therapy strategy for each individual patient.

Page 4: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – Strategy for designing novel compounds using QSAR models

Database: Compounds – Activity/Property/Toxicity

Descriptor calculation

Experimental Synthesis

Experimental evaluation of activity/property/toxicity

Variable Selection - Modeling

Model validation:1. Test Set (R2, RMS), cross-validation, Y-randomization2.Domain of applicability

Design of novel compoundsvirtual screening

data mining inverse-QSAR

EXPERIMENT

QSAR DEVELOPMENT

NOVEL STRUCTURE

DESIGN

Strategy for designing novel compounds using QSAR models

Page 5: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – QSAR model development 1. Database design

QSAR model development 1. Database design

Selection of compounds

Lead compounds and derivatives

Representative of the structures under study

Wide range of structural characteristics

Experimental data (activities, toxicity)

Protocol

Experimental data

Literature

Calculation of descriptors - topological indices (Randic, Kier&Hall), Stereochemical indices (molecular volume V), Electronic/Quantum descriptors (ΕHOMO, ELUMO), Physicochemical descriptors (logP)

Commercial software

In house software

Experimental data

Literature

Page 6: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – QSAR model development 2. Model generation

QSAR model development 2. Model generation

Variable selection

Elimination stepwise regression (ES-SWR)

Genetic algorithm developed in-house (GASA-RBF)

Modeling methodologies

Linear – Multiple linear regression (MLR), Partial least squares (PLS)

Neural networks – Radial basis function (RBF) trained using the fuzzy means algorithm or the subtractive clustering algorithm both developed in-house

Support Vector Machines (SVM) using the LIB-SVM software

Page 7: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

• Standard statistical indices (R2, RMS, F)

• Predictive ability tested on external data sets

• Cross – validation

• Y-randomization test

• Domain of applicability

NTUA QSAR Group – QSAR model development 3. Model validation

QSAR model development 3. Model validation

Page 8: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group- Design of novel compounds

Virtual Screening

Structural modifications with insertion, deletion, replacement etc of substituents orsubstructures and prediction of activity/toxicity from the QSAR model

Data mining

Search for chemical similarity between active compounds and other compounds.

Inverse optimization method

Formulation and solution of optimization of mathematical optimization problems with constraints (i.e. connectivity, valence) for the identification of the lead compound with optimal characteristics

Design of novel compounds

Page 9: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – Case studies, Solving QSPR problems

Case studies: Solving QSPR problems

“Prediction of High Weight Polymers Glass Transition Temperature Using RBF Neural Networks”,

Journal of Molecular Structure: THEOCHEM 2005, 716, 193-198

“Prediction of Intrinsic Viscosity in Polymer-Solvent Combinations using a QSPR model"

Polymer 2006 47 3240-3248

"A novel QSPR model to predict è (lower critical solution temperature) in polymer solutions using molecular descriptors" Journal of Molecular Modeling 2007 13 55-64

"Development and Evaluation of a QSPR Model or the Prediction of Diamagnetic Susceptibility" QSAR Comb. Sci. 27, 2008, No. 4, 432 – 436

Page 10: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – Case studies, Solving QSAR - QSTR problems

Case studies: Solving QSAR - QSTR problems

QSAR Problems

“QSAR study on para – substituted aromatic sulfonamides as carbonic anhydrase II inhibitors using topological information indices” Bioorganic and Medicinal Chemistry 2006 14 (4) 1108-1114.

“A Novel QSAR Model for Evaluating and Predicting the Inhibition of Dipeptidyl Aspartyl Fluoromethylketones” QSAR & Combinatorial Science 2006 25 928-935

"A Novel QSAR Model for Modeling and Predicting Induction of Apoptosis by 4-Aryl-4H-chromenes". Bioorganic and Medicinal Chemistry 2006 14, 6686-6694

"A novel QSAR model for predicting the inhibition of CXCR3 receptor by 4-N-aryl-[1,4]diazepane ureas" European Journal of Medicinal Chemistry

QSTR Problems

A novel RBF neural network  training methodology to predict toxicity to Vibrio Fischeri" Molecular Diversity 2006 10, 213-221.

“Prediction of toxicity using a novel RBF neural network training methodology”. Journal of Molecular Modeling 2006 12, 297-305

Page 11: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – Case studies, Virtual Screening – In Silico Lead Optimization

Case studies: Virtual Screening – In Silico Lead Optimization

"Identification of a series of novel derivatives as potent HCV inhibitors by a ligand – based virtual screening optimized procedure" Bioorganic & Medicinal Chemistry 2007 15 7237-7147

"Optimization of Biaryl Piperidine and 4-Amino-2-biarylurea MCH1 Receptor Antagonists using QSAR Modeling, Classification Techniques and Virtual Screening", Journal of Computer-Aided Molecular Design 2007 20 83-95.

Investigation of  Substituent Effect of 1-(3,3-Diphenylpropyl) - Piperidinyl Phenylacetamides  Amides on CCR5 Binding Affinity using QSAR and Virtual Screening  Techniques” Journal of Computer-Aided  Molecular Design 2006 20, 83-95.

‘A Novel Simple QSAR Model for the Prediction of anti-HIV Activity Using Multiple Linear Regression Analysis’ Molecular Diversity 2006 10, 405-414

Page 12: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – QSAR Software under development-1

The user can load existing mol files or create new mol files

Page 13: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – QSAR Software under development-2

Page 14: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – QSAR Software under development-3

Page 15: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – QSAR Software under development-4

Page 16: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – QSAR Software under development-5

Page 17: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – The RBF neural network architecture

The RBF neural network architecture

A special neural network architecture with important advantages

Simple network topology

Fast training algorithms (usually split into two phases)

Linear relationship between the hidden layer and the output layer

Accurate predictions (in many test cases it has been shown that they provide more successful results

compared to other neural network types)

Page 18: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – The RBF neural network topology

The RBF neural network topology

ΣΣΣΣ

Input LayerInput Layer Hidden layer Output layerOutput layer

x=[x1 x2 x3]x1 x2 x3

w2

w3

w4

4

1j

j

y w f

jx - c

w1

cc=[cc11 c c22 c c33 c c44]c3c2 c4c1

(x1 -c

j (1) ) 2

(x2-cj(2) )2

(x 3-c j(3

) )2

jf x - c

3 2

1

( )j i ji

x c i

x - c

Radial BasisRadial BasisFunctionFunction

Page 19: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group – The fuzzy means algorithm

An RBF network training algorithm that:

Is very fast, since it requires only one pass of the training examples

Determines the hidden layer structure automatically

Locates the hidden node centers so that they are not close to each other

Provides a solution that does not depend on an initial random selection

The fuzzy means algorithm

(Sarimveis et al., 2002, Industrial and Engineering Chemistry Research)

The fuzzy means algorithm determines the proper number of hidden nodes

and calculates the hidden node center locations. The rest of the network

parameters are determined using conventional techniques.

The key concept behind the algorithm is the idea of the fuzzy partition of the

input space into a number of fuzzy subsets.

Page 20: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

x1

x 2 Two Dimensional Example

α1,1 α1,2 α1,3 α1,4α1,5

α2,

2,2

α2,

2,4

α2,

5α1,2

α2,

3

NTUA QSAR Group – Fuzzy partition of the input space

Fuzzy partition of the input space

Then, fuzzy partitioning is extended to the entire input space so that a number of fuzzy subspaces are created, where each fuzzy subspace is defined as a combination of N particular fuzzy sets.

Assuming a system with N input variables, the domain of each input variable is evenly partitioned into a number of triangular fuzzy subsets.

The multidimensional membership function of an input vector x into a fuzzy subspace Al, is defined

1/22

11/2

2

1

( )

( )

Nli i

l i

N

ii

a x k

rd k

δa

x

Page 21: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

First data point [x (1) y(1)]First data point [x (1) y(1)]

Determination of first fuzzy subspace(Hidden neuron center)

Determination of first fuzzy subspace(Hidden neuron center)

New data point [x (k) y(k)]New data point [x (k) y(k)]

0 ( ( )) 1lrd k x

Determination of next fuzzy subspace(Hidden neuron center)

Determination of next fuzzy subspace(Hidden neuron center)

0

1( ( )) min ( ( ))l l

l Lrd k rd k

x x

L=1L=1

L=L+1L=L+1 YES

NO

NTUA QSAR Group – Flowchart of the fuzzy means algorithm

Flow chart of the fuzzy means algorithm

Page 22: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

Hybrid coding of candidate solutions Hybrid coding of candidate solutions (chromosomes) (chromosomes)

Binary coding for each descriptor Binary coding for each descriptor

(first N genes)(first N genes)

Integer coding for the number of fuzzy Integer coding for the number of fuzzy

sets used in the fuzzy means algorithmsets used in the fuzzy means algorithm

xx77(1)(1)

xx77(2)(2)

xx77(k)(k)

0 1 1 0 1 0 1 8

xx11(1)(1)

xx11(2)(2)

xx11(k)(k)

xx22(1)(1)

xx22(2)(2)

xx22(k)(k)

xx33(1)(1)

xx33(2)(2)

xx33(k)(k)

xx44(1)(1)

xx44(2)(2)

xx44(k)(k)

xx55(1)(1)

xx55(2)(2)

xx55(k)(k)

xx66(1)(1)

xx66(2)(2)

xx66(k)(k)

DescriptorsDescriptorsNumberNumberof fuzzy of fuzzy

setssets

Creation of initial populationCreation of initial population

DescriptorsDescriptors:: probability equal to 5probability equal to 500% %

for every digit to receive value for every digit to receive value

11

Fuzzy setsFuzzy sets: : Random selection from a Random selection from a

normal distribution between normal distribution between LBLB and and UBUBObjective functionObjective function

Leave-one-out cross-validationLeave-one-out cross-validation

2( ),

GA 1

ˆ( ) RMSECV

K

i j ii

j

y y

K

2

( ),GA 1

ˆ( ) RMSECV

K

i j ii

j

y y

K

NTUA QSAR Group – 1st stage of GASA-RBF

1st stage of GASA-RBF

Page 23: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

ReproductionReproduction

Cross-overCross-over

MutationMutation

(1 / )

(1 / )

( ) 1 , if random digit is 0

( ) 1 , if random digit is 1

b

b

t Told old

new

t Told old

fz UB fz r

fz

fz fz LB r

bb11 b b2 2 … b… bpospos bbpos+1pos+1 … b … bnn fz fzbb

cc11 c c2 2 … c… cpospos ccpos+1pos+1 … c … cnn fz fzcc

Binary genesBinary genes: : Flip bit mutation (the values Flip bit mutation (the values in a small percentage of genes for each in a small percentage of genes for each population are inverted)population are inverted)

Integer genes:Integer genes: Non-uniform mutationNon-uniform mutation

Strings of genes are exchanged Strings of genes are exchanged between pairs of chromosomesbetween pairs of chromosomes

Roulette wheel selectionEach chromosome is allocated a slot onEach chromosome is allocated a slot onthe roulette, with size proportional to the roulette, with size proportional to its fitnessits fitness

Exploitation operatorsExploitation operators

Intensified search in spaces of highIntensified search in spaces of highquality solutionsquality solutions

Exploration operatorsExploration operators

New solution spaces are exploredNew solution spaces are explored

NTUA QSAR Group – 1st stage of GASA-RBF (continued)

1st stage of GASA-RBF (continued)

Page 24: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

SIMULATED ANNEALINGSIMULATED ANNEALING

Probability of Accepting a worse solutionProbability of Accepting a worse solution:

( )- ( )- new cur

k

f s f s

TP e

GENERALIZED SIMULATED GENERALIZED SIMULATED ANNEALINGANNEALING

1 , 0, 1, 2,...k kT r T k

Cooling scheduleCooling schedule

( ) ( )

( )new cur

bsf cons

f s f s

f s fP e

No need to determine a cooling scheduleNo need to determine a cooling schedule

( )- ( )- new cur

k

f s f s

TP e

OnlyOnly ββ must be determined by the usermust be determined by the user0.80 0.99r

InitiallyInitially, , almost all solutionsalmost all solutionsare acceptedare accepted Random search Random search

As T approaches zero only improving solutionsAs T approaches zero only improving solutions are acceptedare accepted Local Search Local Search

The following design parameters must be specifiedThe following design parameters must be specified::1.1. InitialInitial value of Tvalue of T2.2. Strategy for reducing Strategy for reducing ΤΤ33.. FinalFinal value of value of ΤΤ

NTUA QSAR Group – 2nd stage of GASA-RBF

2nd stage of GASA-RBF

Page 25: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group- References

Tsekouras, G, H. Sarimveis and G. Bafas, “A method for fuzzy system identification based on clustering analysis”, (Systems Analysis Modeling Simulation, 39,543-558, 2001).

Tsekouras, G, H. Sarimveis, C. Raptis and G. Bafas, “A fuzzy logic approach for system qualitative characteristics”, (Computers & Chemical Engineering, 26, 429-438, 2002).

Sarimveis, H., A. Alexandridis, G. Tsekouras and G. Bafas, “A fast and efficient algorithm for training radial basis function neural networks based on a fuzzy partition of the input space”, (Industrial & Engineering Chemistry Research, 41, 751-759, 2002).

Tsekouras, G., H. Sarimveis, G. Bafas, “A simple algorithm for training fuzzy systems using input-output data” (Advances in Engineering Software, 34(5) 247-259, 2003).

Sarimveis, H, A. Alexandridis, G. Bafas, “A fast training algorithm for RBF networks based on subtractive clustering” (Neurocomputing, 51 501-505, 2003).

Sarimveis H. A. Alexandridis, S. Mazarakis, G. Bafas, “A new algorithm for developing dynamic radial basis function neural network models based on genetic algorithms”, (Computers and Chemical Engineering, 28(1-2), 209-217, 2004).

Tsekouras G., H. Sarimveis, “A new approach for measuring the validity of the fuzzy c-means algorithm”, (Advances in Engineering Software, 35(8-9), 567-575, 2004).

Tsekouras G., H. Sarimveis, E. Kavakli, G. Bafas “A hierarchical fuzzy-clustering approach to fuzzy modeling”, (Fuzzy Sets and Systems, 150(2), 245-266, 2005).

Alexandridis A., P. Patrinos, H. Sarimveis, G. Tsekouras, “A two-stage evolutionary algorithm for variable selection in the development of RBF neural network models”, (Chemometrics and Intelligent Laboratory Systems, 75(2), 149-162, 2005).

Afantitis Α., G. Melagraki, K. Makridima, A. Alexandridis, H. Sarimveis, O. Iglessi-Markopoulou, “Prediction of High Weight Polymers Glass Transition Temperature Using RBF Neural Networks” (ΤΗΕOCHEM: Journal of Molecular Structure, 716(1-3), 193-198, 2005).

G. Melagraki, Afantitis Α., H. Sarimveis, O. Iglessi-Markopoulou, C. T. Supuran, “QSAR study on para – substituted aromatic sulfonamides as carbonic anhydrase II inhibitors using topological information indices”, (Bioorganic & Medicinal Chemistry, 14(4), 1108-1114, 2006).

G. Melagraki, Afantitis Α., K. Makridima, H. Sarimveis, O. Iglessi-Markopoulou “Prediction of toxicity using a novel RBF neural network training methodology”, (Journal of Molecular Modeling, 12(3), 297-305, 2006).

References

Page 26: The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS, August 2008

NTUA QSAR Group- References (continued)

A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "Prediction of the Intrinsic Viscosity of Polymer – Solvent Combinations using a QSPR model", (Polymer, 47(9), 3240-3248, 2006).

A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "Investigation of Substituent Effect of 1-(3,3-Diphenylpropyl)-Piperidinyl Phenylacetamides Amides on CCR5 Binding Affinity using QSAR and Virtual Screening Techniques", (Journal of Computer-Aided Molecular Design, 20, 83-95, 2006).

G. Melagraki, Afantitis Α., H. Sarimveis, O. Iglessi-Markopoulou, A. Alexandridis “A novel RBF neural network training methodology to predict toxicity to Vibrio fischeri”, (Molecular Diversity , 10(2), 213-221, 2006).

A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, " A Novel QSAR Model for Predicting Induction of Apoptosis by 4-Aryl-4H-chromenes", (Bioorganic and Medicinal Chemistry, 14, 6686-6694, 2006).

A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, “A Novel Simple QSAR Model for the Prediction of anti-HIV Activity Using Multiple Linear Regression Analysis”, (Molecular Diversity , 10, 405-414, 2006).

A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "A Novel QSAR Model for Evaluating and Predicting the Inhibition Activity of Dipeptidyl Aspartyl Fluoromethylketones", (QSAR & Combinatorial Science, 10, 928-935, 2006).

Melagraki G., A. Afantitis, H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, " A novel QSPR model to predict θ(lower critical solution temperature) in polymer solutions using molecular descriptors", (Journal of Molecular Modeling, 13(1), 55-64, 2007).

Melagraki G., A. Afantitis, H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "Optimization of Biaryl Piperidine and 4-Amino-2-biarylurea MCH1 Receptor Antagonists using QSAR Modeling, Classification Techniques and Virtual Screening", (Journal of Computer-Aided Molecular Design, 21(5), 251-267, 2007).

Melagraki G., A. Afantitis, H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, " Identification of a series of novel derivatives as potent HCV inhibitors by a ligand – based virtual screening optimized procedure", (Bioorganic and Medicinal Chemistry, 15, 7237-7247, 2007).

A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "Development and Evaluation of a QSPR Model for the Prediction of Diamagnetic Susceptibility”, (QSAR & Combinatorial Science, 27(4), 432-436, 2008).

A. Afantitis, Melagraki G., H. Sarimveis, O. Iglessi-Markopoulou, G. Kollias, "A novel QSAR model for predicting the inhibition of CXCR3 receptor by 4-N-aryl-[1,4] diazepane ureas”, accepted, European Journal of Medicinal Chemistry, 2008.

References (continued)