basic concepts and areas of application · chemoinformatics is a generic term that encompasses the...

Post on 26-Jul-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Chemoinformatics: Basic Concepts and Areas of Application

Alexandre Varnek

Laboratory of Chemoinformatics, University of Strasbourg

Double diploma UniStra/KFU

Chem(o)informatics

Cheminformatics

Chemical Informatics

Infochimie

Chémoinformatique

Хемоинформатика

Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information

G. Paris, 1998

Chemoinformatics - definition

Chemoinformatics is the application of informatics methods to solve chemical problems

J. Gasteiger, 2004

Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization”

F.K. Brown, 1998

Chemoinformatics is a field based on the representation of molecules as objects

(graphs or vectors) in a chemical space A. Varnek & I. Baskin, 2011

Selected books in chemoinformatics

Paul Emile Lecoq de

Boisbaudran

Gallium discovery:

the first QSAR successful story

Predicted in 1869

Dmitry Mendeleév

Discovered in 1875

Densitypred ≈ 6.0 g/cm3 Densityexp = 4.7 (initial)

Densityexp = 5.935

(corrected)

Chemoinformatics:

new disciline combining several „old“ fields

• Chemical databases

• Structure-Activity modeling (QSAR)

• Structure-based drug design

• Computer-aided synthesis design

Peter Willett Michael Lynch

Corwin Hansch Johann Gasteiger

Irwin D. Kuntz

Elias Corey Ivar Ugi

Hans-Joachim Böhm

• Needs for chemoinformatics

• Fundamentals of chemoinformatics

• Chemical Space paradigm

• Virtual screening approaches

• Perspectives

OUTLOOK

Needs in Chemoinformatics

10

Chemical universe

> 100 M compounds are currently recorded

• How to select useful compounds from this huge dataset ?

• How to design new compounds ?

• How to synthesize these compounds ?

Target Protein

Large libraries

of molecules

High Throughout Screening

Hit

experiment

computations

Virtual

Screening

Small Library of selected hits

Chemical universe:

• > 108 compounds are currently available

• 1033 druglike molecules could potentially be synthesised

(see P. Polischuk, T. Madzidov et al., JCAMD, 2013)

Virtual screening is inevitable to analyse a

huge amount of protein-ligand combinations

Virtual screening must be very fast and efficient !

Human proteome:

• 5000 druggable proteins

Ionic Liquids

Ionic Liquids are composed of

large organic cations:

PF6-, Cl-, BF4

-, CF3SO3-, [CF3SO2)2N]-

and anions:

N RR12

+N R

R

1

2

+ N

N+

R

R

R

1

2

3

N

R

R

R

R1

2

3

4

+N

N+

R

R

1

3

There exist combinations of

ions that could lead to useful ionic

liquids

Ionic Liquids

Large organic cations:

PF6-, Cl-, BF4

-, CF3SO3-, [CF3SO2)2N]-

anions:

N RR12

+N R

R

1

2

+ N

N+

R

R

R

1

2

3

N

R

R

R

R1

2

3

4

+N

N+

R

R

1

3

1018

Virtual screening : finding the needle in the haystack

CHEMICAL DATABASE

~106 – 109

molecules

Chemoinformatics: pattern recognition in chemistry

CHEMICAL DATABASE

~106 – 109

molecules

model

- Specific structural motifs,

- Selected molecular properties (shape, fields, …),

- Interaction patterns,

- Mathematical equations

Activity = F (structure)

VIRTUAL

SCREENING

INACTIVES

HITS ~106 – 109

molecules

CHEMICAL DATABASE

Chemoinformatics: Virtual screening “funnel”

Similarity search

Filters

(Q)SAR

Docking

Pharmacophore

~101 – 103

molecules

Chemoinformatics: Virtual screening “funnel”

Similarity search

Filters

(Q)SAR

Docking

Pharmacophore

VIRTUAL

SCREENING

INACTIVES

HITS

~106 – 109

molecules

CHEMICAL

DATABASE

~101 – 103

molecules

Ligand-based

Structure-based

Chemoinformatics as a

theoretical chemistry discipline

20

Chemoinformatics is defined as individual discipline

characterized by its own molecular model, basic concepts,

major applications and learning approach

21

Theoretical chemistry

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics

- Molecular model - Basic concepts - Major applications - Learning approaches

22

Molecular Model

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics • molecular graph

• descriptor vector

electrons and nuclei

atoms and bonds

Chemoinformatics is a field based on the representation of molecules as

objects (graphs or vectors) in a chemical space

Chemoinformatics: From Data to Knowledge

know-

ledge

information

data

generalization

context

measurement

or calculation

deductive

learning

inductive

learning

Chemoinformatics learns from experimental data !

Basic concepts

Quantum Chemistry

Force Field

Molecular Modelling

Chemoinformatics chemical space

wave/particle dualism

classical mechanics

Chemical space paradigm

26

Chemical Space representations

graphs-based descriptors -based

SPACE = objects + metric

Graph-based chemical space

A. Schuffenhauer, P. Ertl, et al. J. Chem. Inf. Model., 2007, 47 (1), 47-58

Scaffold Tree

Natural Product Scaffold Tree

Courtesy of P. Ertl

Natural Product Scaffold Tree

Courtesy of P. Ertl

Descriptors-based chemical space

vectorial space defined by molecular descriptors

32

Case study: Hansch Analysis

3 types of physicochemical parameters are used:

• Electronic (s)

• Steric (dEs)

• Hydrophobic (logP)

Biological Activity = f (Physicochemical parameters ) + constant

Activity = a ( log P )2 + b log P + s + dEs + cont

33

Case study: Hansch Analysis

Molecule 1

Molecule 2

34

Molecular Descriptors :

ensemble of topological, electronic, geometry parameters calculated directly

from molecular structure

Descriptors

D1

D2

Di

Molecular graph

-Topological indices,

- Atomic charges,

- Inductive descriptors,

- Substructural fragments,

- Molecular volume and surface, …

Descriptor vector

> 5000 types of descriptors are reported

35

Chemography: Design and visualization of chemical space

Greenland

2.2 M km2

Australia

7.7 km2

Arabian Peninsula

3.5 M km2

Dimensionality Reduction problem

37

Swiss Roll

• GTM relates the latent space with a 2D “rubber sheet” (manifold) injected into

the high-dimensional data space.

• The visualization plot is obtained by projecting the data points onto the manifold

and then letting the “rubber sheet” relax to its original form.

Generative Topography Mapping (GTM)

N. Kireeva, I. Baskin, H. Gaspar, D. Horvath, G. Marcou, A. Varnek Mol. Inf. 2012, 31, 301–312

GTM of a dataset containing 10 activities from DUD

Similarity principle:

similar molecules possess similar properties

39

Chemical Similarity

0.82

0.39

0.84

0.72

0.67

0.64

0.53

0.56

0.52

reference

compound

Similar compounds possess similar properties

Chemical space representation: Activity Landscapes

i

ik

i

iki

kR

RA

= A Expectation of activity in k - node for the training set

logK of Lu3+L complexes

Ak

logKLu

42

Strong binders Weak binders

Activity landscape of lanthanides’ binders

Generative Topographic Mappping

of the set of Ln binders

Contours correspond to different

logK values

H. Gaspar, I. Baskin, G. Marcou, A. Varnek unpublished results

Biopharmaceutics Drug Disposition Classification System

DATASET: 893 drugs

DESCRIPTORS: VolSurf

Case study: classification models for BDDCS classes

Visualization of models’ Applicability Domain

44

CPF ≤ 1, coverage =100 % CPF ≤ 5, coverage = 47 %

BDDCS classes probability distribution

Colored zones on the maps correspond to model’s applicability domain

H. Gaspar, G. Marcou, A. Varnek JCIM, 2013

Class Preference Factor 𝑪𝑷𝑭 = max𝑐 𝑃(𝑘|𝐶)

𝑃(𝑘|𝐶𝑖), ∀𝐶𝑖 ≠ 𝐶

Chemoinformatics:

Properties predictions

46

Quantitative Structure-Activity Relationships

(QSAR)

Activity = F (structure)

= F (descriptors)

machine-learning methods

• neural networks, support vector machine,

random forest, naïve Bayes, PLS, …

A. Varnek & I. Baskin Machine Learning Methods in Chemoinformatics: Quo Vadis?

J. Chem. Inf. Model. 2012, 52, 1413−1437

predictions of > 20 physico-chemical

properties and NMR spectra for

each individual compound

Chemoinformatics tools in SciFinder:

ISIDA virtual screening platform

infochim.u-strasbg.fr/webserv/VSEngine.html

Machine Learning Methods in Chemoinformatics: Quo Vadis ?

A. Varnek and I. Baskin , J Chem. Inf. Mod., 2012, 52, 1413-1437

Chemoinformatics: virtual screening in 3D

Virtual screening : finding the needle in the haystack

CHEMICAL DATABASE

~106 – 109

molecules

What is in common between these two molecules ?

-

+

+ -

- Arg-Gly-Asp-Phe

Tirofiban

Pharmacophore model of ligand complementary to

integrine αIIbβ3

Positive charge,

H-donor

Negative charge,

H-acceptor

15.5 Å

5 Å

- +

Hydrophobic

interactions

-

+ + -

pKi = 7.51

TanimotoCombo = 0.74

pKi = 7.82

TanimotoCombo = 0.67

pKi = 7.82

Molecular Shape similarity analysis

Molecular fields

56

Lock Key

Ligand-Protein complex

+

Hermann Emil Fischer

Ligand-to-protein docking :

Lock-and-key paradigm

Selected in silico designed compounds that were synthesized

and successfully tested for bioactivity

G. Schneider J Comput Aided Mol Des (2012) 26:115–120

Chemoinformatics: areas of application

- Drug design (pharmacodynamics and pharmacokinetics),

- Prediction of physico-chemical properties,

- Materials design,

- Synthesis design,

- Molecular spectra simulations

Chemoinformatics: perspectives

60

Assessment of biological activity

61

Assessment of side effects

62 See review by D. Rognan, British Journal of Pharmacology (2007), 1–15

Chemoinformatics : Complexity challenge

P. Csermely1 et al. Pharmacology & Therapeutics, 2012

64

Day 1: Databases

Veli-Pekka Hyttinen

Timur Madzidov

Gilles Marcou Dragos Horvath

Chemical Databases: Encoding, Storage and Search

of Chemical Structures

SciFinder - The choice for chemistry research

Tutorial with ChemAxon

Day 2: QSAR

Igor Tetko

Igor Baskin

Obtaining, Validation and Application of SAR/QSAR Models

SAR/QSAR Modelling: state of the art

Tutorial with OChem

Alex Tropsha

ADMET Predictions

Day 3: virtual screening in 3D

Conformational Sampling

Pharmacophore and Its Applications

Tutorial with LigandScoute

Molecular Docking Methods

Gilles Marcou

Dragos Horvath

Thierry Langer

Sharon Bryant

Gilles Marcou Dragos Horvath

Tutorial with LeadIt

Day 4: Drug Design applications

Konstantin Balakin

Vladimir Poroikov

Computational Mapping Tools for Drug Discovery

Drug Design & Discovery in Academia

staff

Invited Professors

at UniStra Invited Lecturers

Visiting scientists

Visiting friends

top related