online chemical modeling environment: models

Online chemical modelingenvironment: models

Iurii Sushko, Sergey NovotarskiyThursday, August 13, 2009

Existent alternatives

Classical approach: Weka, R, Mathematica

Advantages:

1. Most flexible2. Suitable for research and deep analysis

Disadvantages:

1. It’s complex: suitable for mathematician,informatician, statistician but notchemist and biologist

2. Very tedious data preparation

Community driven source Authority driven source

Collaboration in QSAR

Possibilities for collaboration in QSAR:

1.Use others' dataa.build models, based on others' datab.validate your models against others' data

2. Use others' modelsa.validate your data against published modelsb.use output of published models

as an input for new onesc.compare performance of published models

with own ones

All existent modeling tools lack means of collaboration

OCHEM advantages

Collaboration-targeted features:

1. Tight connection between database andmodeling tools

2. Wiki, discussion, comments, tags

Simplified modeling workflow:

1. Sensible defaults for most parameters2. Only necessary parameters requested3. Data representation is targeted for chemist4. Possibility of fine tune for experts

Modeling workflow

1. Data preparation

2. Building a model

3. Analysing the model

4. Application of themodel

AD

Stage 1 – Data preparation

IntroducerBill G., Sergey B.

Date of modificationInformationsystem

TagsToxicology, Biology,Partition coefficient.

logP = 0.5Melting Point = 100

C

PropertyTemperature,pH, species,

tissue, method

Condition

Garberg, P“In vitro models for …”

ArticleBenzene, Urea, ...

Structure

FilteringToxicology, Biology,Partition coefficient.

Data Point

ManipulationEditing

OrganizationWorking sets<

Stage 1 – Data preparation TagsToxicology, Biology,Partition coefficient.

ManipulationEditing

OrganizationWorking sets<

FilteringToxicology, Biology,Partition coefficient.

Stage 1: Data preparation

Stage 2: Model building - input data

Stage 2: Model building - descriptors (I)

Stage 2: Model building - descriptors (II)

Stage 2: Model building – descriptors (manual)

Stage 3: Analysing the model (I)Basic model statistics

Stage 3: Analysing the model (II)Applicability domain assessment

Stage 4: Application of the modelSelection of the model of interest

Model, published by another user

Newly created model

Stage 4: Application of the modelProvide target compounds

Stage 4: Application of the modelPrediction results

Target compound Prediction Accuracy assessment

Stage 4: Application of the modelAssessment of accuracy of predictions

Target compound

Need for distribution of calculations

Fact: QSAR modeling is calculation-intensive

Examples of calculations:• Training of neural network ensembles• Computing 3D conformations• Computing complex molecular descriptors

Solution:• Distributed calculation network• User can postpone, cancel or fetch task results later

Automatic updates and testing

Calculation servers are automatically updated uponavailability of new releaseAutomatic testing of servers upon updatesTasks that did not pass tests are disabled, keepingthe server functional

Backend - distributed calculationCentral metaserver, distributed calculation serversAutomatic server updates, on-the-fly server testing

Basic facts

About 50000 experimental measurements on285 physicochemical properties published inabout 2000 articlesImplemented modeling methods:ANN, KNN, MLR, Kernel ridge regressionIntegrated descriptors: Dragon, E-State,Fragments

Backend - basic facts

Platform: Java EEDatabase: MySQLServer: TomcatORM: HibernateMVC: Spring frameworkClient side: AJAX, HTML+Javascript

online chemical modeling environment: models

Education