basic concepts and areas of application · chemoinformatics is a generic term that encompasses the...

69
1 Chemoinformatics: Basic Concepts and Areas of Application Alexandre Varnek Laboratory of Chemoinformatics, University of Strasbourg

Upload: others

Post on 26-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

1

Chemoinformatics: Basic Concepts and Areas of Application

Alexandre Varnek

Laboratory of Chemoinformatics, University of Strasbourg

Page 2: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Double diploma UniStra/KFU

Page 3: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chem(o)informatics

Cheminformatics

Chemical Informatics

Infochimie

Chémoinformatique

Хемоинформатика

Page 4: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information

G. Paris, 1998

Chemoinformatics - definition

Chemoinformatics is the application of informatics methods to solve chemical problems

J. Gasteiger, 2004

Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization”

F.K. Brown, 1998

Chemoinformatics is a field based on the representation of molecules as objects

(graphs or vectors) in a chemical space A. Varnek & I. Baskin, 2011

Page 5: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Selected books in chemoinformatics

Page 6: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Paul Emile Lecoq de

Boisbaudran

Gallium discovery:

the first QSAR successful story

Predicted in 1869

Dmitry Mendeleév

Discovered in 1875

Densitypred ≈ 6.0 g/cm3 Densityexp = 4.7 (initial)

Densityexp = 5.935

(corrected)

Page 7: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics:

new disciline combining several „old“ fields

• Chemical databases

• Structure-Activity modeling (QSAR)

• Structure-based drug design

• Computer-aided synthesis design

Peter Willett Michael Lynch

Corwin Hansch Johann Gasteiger

Irwin D. Kuntz

Elias Corey Ivar Ugi

Hans-Joachim Böhm

Page 8: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

• Needs for chemoinformatics

• Fundamentals of chemoinformatics

• Chemical Space paradigm

• Virtual screening approaches

• Perspectives

OUTLOOK

Page 9: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Needs in Chemoinformatics

Page 10: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

10

Chemical universe

> 100 M compounds are currently recorded

• How to select useful compounds from this huge dataset ?

• How to design new compounds ?

• How to synthesize these compounds ?

Page 11: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Target Protein

Large libraries

of molecules

High Throughout Screening

Hit

experiment

computations

Virtual

Screening

Small Library of selected hits

Page 12: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemical universe:

• > 108 compounds are currently available

• 1033 druglike molecules could potentially be synthesised

(see P. Polischuk, T. Madzidov et al., JCAMD, 2013)

Virtual screening is inevitable to analyse a

huge amount of protein-ligand combinations

Virtual screening must be very fast and efficient !

Human proteome:

• 5000 druggable proteins

Page 13: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Ionic Liquids

Ionic Liquids are composed of

large organic cations:

PF6-, Cl-, BF4

-, CF3SO3-, [CF3SO2)2N]-

and anions:

N RR12

+N R

R

1

2

+ N

N+

R

R

R

1

2

3

N

R

R

R

R1

2

3

4

+N

N+

R

R

1

3

Page 14: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

There exist combinations of

ions that could lead to useful ionic

liquids

Ionic Liquids

Large organic cations:

PF6-, Cl-, BF4

-, CF3SO3-, [CF3SO2)2N]-

anions:

N RR12

+N R

R

1

2

+ N

N+

R

R

R

1

2

3

N

R

R

R

R1

2

3

4

+N

N+

R

R

1

3

1018

Page 15: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Virtual screening : finding the needle in the haystack

CHEMICAL DATABASE

~106 – 109

molecules

Page 16: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics: pattern recognition in chemistry

CHEMICAL DATABASE

~106 – 109

molecules

model

- Specific structural motifs,

- Selected molecular properties (shape, fields, …),

- Interaction patterns,

- Mathematical equations

Activity = F (structure)

Page 17: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

VIRTUAL

SCREENING

INACTIVES

HITS ~106 – 109

molecules

CHEMICAL DATABASE

Chemoinformatics: Virtual screening “funnel”

Similarity search

Filters

(Q)SAR

Docking

Pharmacophore

~101 – 103

molecules

Page 18: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics: Virtual screening “funnel”

Similarity search

Filters

(Q)SAR

Docking

Pharmacophore

VIRTUAL

SCREENING

INACTIVES

HITS

~106 – 109

molecules

CHEMICAL

DATABASE

~101 – 103

molecules

Ligand-based

Structure-based

Page 19: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics as a

theoretical chemistry discipline

Page 20: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

20

Chemoinformatics is defined as individual discipline

characterized by its own molecular model, basic concepts,

major applications and learning approach

Page 21: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

21

Theoretical chemistry

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics

- Molecular model - Basic concepts - Major applications - Learning approaches

Page 22: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

22

Molecular Model

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics • molecular graph

• descriptor vector

electrons and nuclei

atoms and bonds

Chemoinformatics is a field based on the representation of molecules as

objects (graphs or vectors) in a chemical space

Page 23: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics: From Data to Knowledge

know-

ledge

information

data

generalization

context

measurement

or calculation

deductive

learning

inductive

learning

Chemoinformatics learns from experimental data !

Page 24: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Basic concepts

Quantum Chemistry

Force Field

Molecular Modelling

Chemoinformatics chemical space

wave/particle dualism

classical mechanics

Page 25: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemical space paradigm

Page 26: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

26

Chemical Space representations

graphs-based descriptors -based

SPACE = objects + metric

Page 27: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Graph-based chemical space

Page 28: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

A. Schuffenhauer, P. Ertl, et al. J. Chem. Inf. Model., 2007, 47 (1), 47-58

Scaffold Tree

Page 29: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Natural Product Scaffold Tree

Courtesy of P. Ertl

Page 30: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Natural Product Scaffold Tree

Courtesy of P. Ertl

Page 31: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Descriptors-based chemical space

vectorial space defined by molecular descriptors

Page 32: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

32

Case study: Hansch Analysis

3 types of physicochemical parameters are used:

• Electronic (s)

• Steric (dEs)

• Hydrophobic (logP)

Biological Activity = f (Physicochemical parameters ) + constant

Activity = a ( log P )2 + b log P + s + dEs + cont

Page 33: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

33

Case study: Hansch Analysis

Molecule 1

Molecule 2

Page 34: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

34

Molecular Descriptors :

ensemble of topological, electronic, geometry parameters calculated directly

from molecular structure

Descriptors

D1

D2

Di

Molecular graph

-Topological indices,

- Atomic charges,

- Inductive descriptors,

- Substructural fragments,

- Molecular volume and surface, …

Descriptor vector

> 5000 types of descriptors are reported

Page 35: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

35

Chemography: Design and visualization of chemical space

Page 36: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Greenland

2.2 M km2

Australia

7.7 km2

Arabian Peninsula

3.5 M km2

Dimensionality Reduction problem

Page 37: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

37

Swiss Roll

• GTM relates the latent space with a 2D “rubber sheet” (manifold) injected into

the high-dimensional data space.

• The visualization plot is obtained by projecting the data points onto the manifold

and then letting the “rubber sheet” relax to its original form.

Generative Topography Mapping (GTM)

N. Kireeva, I. Baskin, H. Gaspar, D. Horvath, G. Marcou, A. Varnek Mol. Inf. 2012, 31, 301–312

Page 38: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

GTM of a dataset containing 10 activities from DUD

Similarity principle:

similar molecules possess similar properties

Page 39: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

39

Chemical Similarity

0.82

0.39

0.84

0.72

0.67

0.64

0.53

0.56

0.52

reference

compound

Similar compounds possess similar properties

Page 40: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemical space representation: Activity Landscapes

i

ik

i

iki

kR

RA

= A Expectation of activity in k - node for the training set

logK of Lu3+L complexes

Ak

Page 41: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

logKLu

Page 42: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

42

Strong binders Weak binders

Activity landscape of lanthanides’ binders

Generative Topographic Mappping

of the set of Ln binders

Contours correspond to different

logK values

H. Gaspar, I. Baskin, G. Marcou, A. Varnek unpublished results

Page 43: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Biopharmaceutics Drug Disposition Classification System

DATASET: 893 drugs

DESCRIPTORS: VolSurf

Case study: classification models for BDDCS classes

Visualization of models’ Applicability Domain

Page 44: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

44

CPF ≤ 1, coverage =100 % CPF ≤ 5, coverage = 47 %

BDDCS classes probability distribution

Colored zones on the maps correspond to model’s applicability domain

H. Gaspar, G. Marcou, A. Varnek JCIM, 2013

Class Preference Factor 𝑪𝑷𝑭 = max𝑐 𝑃(𝑘|𝐶)

𝑃(𝑘|𝐶𝑖), ∀𝐶𝑖 ≠ 𝐶

Page 45: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics:

Properties predictions

Page 46: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

46

Quantitative Structure-Activity Relationships

(QSAR)

Activity = F (structure)

= F (descriptors)

machine-learning methods

• neural networks, support vector machine,

random forest, naïve Bayes, PLS, …

A. Varnek & I. Baskin Machine Learning Methods in Chemoinformatics: Quo Vadis?

J. Chem. Inf. Model. 2012, 52, 1413−1437

Page 47: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

predictions of > 20 physico-chemical

properties and NMR spectra for

each individual compound

Chemoinformatics tools in SciFinder:

Page 48: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

ISIDA virtual screening platform

infochim.u-strasbg.fr/webserv/VSEngine.html

Page 49: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Machine Learning Methods in Chemoinformatics: Quo Vadis ?

A. Varnek and I. Baskin , J Chem. Inf. Mod., 2012, 52, 1413-1437

Page 50: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics: virtual screening in 3D

Page 51: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Virtual screening : finding the needle in the haystack

CHEMICAL DATABASE

~106 – 109

molecules

Page 52: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

What is in common between these two molecules ?

-

+

+ -

- Arg-Gly-Asp-Phe

Tirofiban

Page 53: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Pharmacophore model of ligand complementary to

integrine αIIbβ3

Positive charge,

H-donor

Negative charge,

H-acceptor

15.5 Å

5 Å

- +

Hydrophobic

interactions

-

+ + -

Page 54: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

pKi = 7.51

TanimotoCombo = 0.74

pKi = 7.82

TanimotoCombo = 0.67

pKi = 7.82

Molecular Shape similarity analysis

Page 55: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Molecular fields

Page 56: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

56

Lock Key

Ligand-Protein complex

+

Hermann Emil Fischer

Ligand-to-protein docking :

Lock-and-key paradigm

Page 57: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Selected in silico designed compounds that were synthesized

and successfully tested for bioactivity

G. Schneider J Comput Aided Mol Des (2012) 26:115–120

Page 58: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics: areas of application

- Drug design (pharmacodynamics and pharmacokinetics),

- Prediction of physico-chemical properties,

- Materials design,

- Synthesis design,

- Molecular spectra simulations

Page 59: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics: perspectives

Page 60: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

60

Assessment of biological activity

Page 61: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

61

Assessment of side effects

Page 62: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

62 See review by D. Rognan, British Journal of Pharmacology (2007), 1–15

Page 63: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Chemoinformatics : Complexity challenge

P. Csermely1 et al. Pharmacology & Therapeutics, 2012

Page 64: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

64

Page 65: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Day 1: Databases

Veli-Pekka Hyttinen

Timur Madzidov

Gilles Marcou Dragos Horvath

Chemical Databases: Encoding, Storage and Search

of Chemical Structures

SciFinder - The choice for chemistry research

Tutorial with ChemAxon

Page 66: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Day 2: QSAR

Igor Tetko

Igor Baskin

Obtaining, Validation and Application of SAR/QSAR Models

SAR/QSAR Modelling: state of the art

Tutorial with OChem

Alex Tropsha

ADMET Predictions

Page 67: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Day 3: virtual screening in 3D

Conformational Sampling

Pharmacophore and Its Applications

Tutorial with LigandScoute

Molecular Docking Methods

Gilles Marcou

Dragos Horvath

Thierry Langer

Sharon Bryant

Gilles Marcou Dragos Horvath

Tutorial with LeadIt

Page 68: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

Day 4: Drug Design applications

Konstantin Balakin

Vladimir Poroikov

Computational Mapping Tools for Drug Discovery

Drug Design & Discovery in Academia

Page 69: Basic Concepts and Areas of Application · Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination,

staff

Invited Professors

at UniStra Invited Lecturers

Visiting scientists

Visiting friends