1 szabolcs csepregi*, szilárd dóránt, nóra máté, miklós vargyas, péter kovács, györgy...

37
1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics and Chemical Modelling to Drug Discovery · 8-19 Nov 2004 Updated. April, 2005 Structural Search Using ChemAxon Tools

Upload: molly-mckenna

Post on 26-Mar-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

1

Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia

First presented at Applications of Cheminformatics and Chemical Modelling to Drug Discovery · 8-19 Nov 2004 Updated. April, 2005

Structural Search Using ChemAxon Tools

Page 2: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 2

Structural Search Using ChemAxon Tools — April 2005 2

Contents

Structural search in cheminformatics

The JChem suite of tools

Structural search in JChem• Interfaces

• Database solutions: JChemBase, Cartridge

• Standardization

• Search features

MCS/MCES and Library MCS

R-group decomposition

The Chemical Terms language

Future plans

All examples generated by ChemAxon’s Marvin

Page 3: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 3

Structural Search Using ChemAxon Tools — April 2005 3

Structural search in cheminformatics

A few examples to highlight the diversity of applications. :

•Compound registration – duplicate checking

•Database search e.g. imidazole derivatives

•Pharmacophoric group identification (JChem Screen, JKlustor)

•Functional group identification

•Cleavage bond identification (JChem Fragmenter)

•Virtual reaction processing (JChem Reactor)

•Standardization (canonicalization of structures, JChem Standardizer)

•Toxical fragment identification (superstructure search)

Page 4: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 4

Structural Search Using ChemAxon Tools — April 2005 4

Search types in JChem

• ABAS(Atom By Atom Search) or structural search:– Exact– Substructure– Superstructure– MC(E)S – maximum common (edge) substructure– R-group decomposition (identify ligands of a given scaffold)

• Similarity search:– Different Descriptors– Different Metrics

Page 5: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 5

Structural Search Using ChemAxon Tools — April 2005 5

ABAS search interfaces

• JSP(Java Server Pages): web GUI for database– Similarity & structural search– Substructure highlighting– Additional constraints– Insert, modify, delete

• Command line utility: jcsearch: for files and DB

• Java API– isMatching() – Only to check matching– findFirst(), findNext() Enumerate all– findAll() possible matchings

• Cartridge: access all functionality from SQL

• Chemical Terms

Page 6: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 6

Structural Search Using ChemAxon Tools — April 2005 6

ABAS options

• General options:• Order sensitive hits e.g.• Pre-assignment of query and target atoms• Consider stereo or not, absolute stereo (ignore chiral flag)• Timeout limit• Exact charge/radical/isotope/query features/bond/stereo matching• Double bond stereo: no check/marked/all double bonds• Chemical Terms filter expression• etc

• Database search:• Maximum search time/number of hits• Additional SQL SELECT expression for prefiltering• Output table• Reverse hits mode

Page 7: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 7

Structural Search Using ChemAxon Tools — April 2005 7

Structural search in database

• Search: two stage method:– Rapid pre-screening based on chemical hashed fingerprints– ABAS (isMatching)

• Duplicate check at compound registration:– Hash code: primary filter– ABAS (isMatching)

• Standardization• Caching of structures and fingerprints allow top performance

Page 8: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 8

Structural Search Using ChemAxon Tools — April 2005 8

Import with JChem Base Manager

Page 9: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 9

Structural Search Using ChemAxon Tools — April 2005 9

JChem Base molecular file formats and integration

Import formats:• SMILES• MDL molfile (v2000 and v3000)• MDL SDF• RXN• RDF• MRV

Database engines:• Oracle• MySQL• MS SQL Server• PostgreSQL• MS Access• DB2• etc.

• CML• PDB• Sybyl molfile• XYZ• Gaussian cube• Image formats for export(JPG, PNG, SVG)

OS: any operating systems running java

• Windows• Linux• Mac OS X• Solaris• etc.

Page 10: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 10

Structural Search Using ChemAxon Tools — April 2005 10

JChem Base performance (1)

Compound registration:

Substructure search in a table of 3 million compounds:

Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i

Number of compounds

Elapsed time

Duplicates not checked Duplicates checked

10,000 32s 45s

100,000 4min 11s 6min 20s

200,000 8min 17s 12min 26s

10.749740

1.20

0.9936

0.112

Search time (s)Number of hitsQuery

Page 11: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 11

Structural Search Using ChemAxon Tools — April 2005 11

JChem Base performance (2)

Similarity search:Tanimoto >0.8

Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i

1.3336

1.3156

1.524

Search time (s)Number of hitsQuery

Page 12: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 12

Structural Search Using ChemAxon Tools — April 2005 12

JChem Cartridge for OracleJChem Cartridge for Oracle

Page 13: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 13

Structural Search Using ChemAxon Tools — April 2005 13

JChem Cartridge for Oracle

Oracle is extended to support chemical database operations using the JChem Cartridge for Oracle

Examples:

Substructure search displaying ID, SMILES codes, and molweight:

SELECT cd_id, cd_smiles, cd_molweight FROM my_structures

WHERE jc_contains(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') = 1;

Similarity search filtered with predicted pKa values, which displays predicted logP and logD values:

SELECT cd_id, jc_logP(cd_smiles), jc_logD(cd_smiles, 7.4) FROM my_structures

WHERE jc_tanimoto(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') >= 0.8

AND jc_pKa(cd_smiles, 'acidic', 1) < 4;

JChem Cartridge for Oracle

Page 14: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 14

Structural Search Using ChemAxon Tools — April 2005 14

JChem Cartridge for Oracle

Chemical Terms examples:• Number of compounds in table “nci_10m” containing benzene and conforming the Lipinski rule of 5:

SELECT count(*) FROM nci_10m WHERE jc_compare(structure,'c1ccccc1','sep=! t:s!ctFilter:(mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)') = 1

• Compounds in table “nci_10m” containing 3-bromoindole and restricting TPSA, molecular weight, rotatable and aromatic ring counts:

SELECT cd_structure FROM nci_10m WHERE jc_compare(structure,'Brc1cnc2ccccc12','sep=! t:s!ctFilter:(PSA() <= 200) && (rotatableBondCount() <= 10) && (mass() <= 500) && (aromaticRingCount() <= 4) ') = 1

• New interface to ChemAxon API features from SQL — accessible from non-java programs as well.

• Enhanced performance of certain SQL queries.

JChem Cartridge for Oracle

Page 15: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 15

Structural Search Using ChemAxon Tools — April 2005 15

Query features 1. Atomic features

• Query atom types: any, hetero, list, not list

• Pseudo atoms e.g. “Resin”

• Explicit lone pairs (matches to implied lone pairs as well.

• Charge, isotope, radical

• Query properties:

Symbol Description

H<n> Total hydrogen count

a Aromatic

A Aliphatic

R<n> Ring count in SSSR

r<n> Ring size in SSSR

v<n> valence

X<n> Connectivity

Page 16: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 16

Structural Search Using ChemAxon Tools — April 2005 16

Query features 2. Atomic SMARTS features

• SMARTS atoms:

• Additional query properties:

• Example:

Carbonyl C, but not amide

Symbol Description

D<n> Degree

h<n> Implicit H count

& ; , ! Logical operators

$(<smarts>) Recursive smarts

+0, -0 Zero charge

Page 17: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 17

Structural Search Using ChemAxon Tools — April 2005 17

Query features 3. Bond features & components

• Query bond types: Any, single or double, single or aromatic, double or aromatic

• Bond topology: chain/ring

• Smarts bonds

• Component level grouping

Symbol Description

- = # Single, double, triple

: aromatic

& , ; ! Logical operators

@ Ring bond

/ \ /? \? Directional bond (cis/trans)

Symbol Description

(C.C) Same component

(C).(C) Different component

C.C No component restrictions

Page 18: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 18

Structural Search Using ChemAxon Tools — April 2005 18

Stereo searching 1. Double bonds

• Levels of check:– All– Only marked double bonds

(MDL: stereo care flag)

– None

Not cis

Not trans

Cis or trans

(unknown)

Trans

Cis

MeaningDepiction

Page 19: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 19

Structural Search Using ChemAxon Tools — April 2005 19

Stereo searching 2. Tetrahedral chirality

• Stereo bond types:

• Relative stereo configuration• Chiral flag model• Enhanced stereo representation: AND<n>, OR<n>, ABS groups

Up or downDownUp

Page 20: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 20

Structural Search Using ChemAxon Tools — April 2005 20

Reaction search

• Reactants, agents, products

• Transformation recognition (mapping)

• Stereospecific reactions (inversion, retention)

• Reactant grouping

Page 21: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 21

Structural Search Using ChemAxon Tools — April 2005 21

R-group search

• Scaffold, R-group definitions

• Monovalent, divalent R-groups

• R-logic

•Occurrence

•If-then

•RestH

Page 22: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 22

Structural Search Using ChemAxon Tools — April 2005 22

Hydrogens

• H representations:– Explicit– Implicit– Query H count (total or implicit)

• Example:

Considered in ABAS

Explicit H Implicit H Query H count

Query

Target

Target

Query

Page 23: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 23

Structural Search Using ChemAxon Tools — April 2005 23

Standardization

• Explicit hydrogens removal

• Aromatic bonds

• Mesomers

• Tautomers

• Counterions

• Stereo representation

Page 24: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 24

Structural Search Using ChemAxon Tools — April 2005 24

Standardization - Aromaticity

• RepresentationsKekulé Aromatic

• Example: The two Kekulé representations below don’t match

• Two options available: ChemAxon & Daylight aromatization

Page 25: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 25

Structural Search Using ChemAxon Tools — April 2005 25

Standardization Example

afterafterbeforebefore

Page 26: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 26

Structural Search Using ChemAxon Tools — April 2005 26

Similarity search

• Descriptors:– Chemical hashed fingerprint

– 2D (topological) pharmacophore fingerprint

– BCUT

– Structural keys

– Hypothesis fingerprints: minimum, average

• Dissimilarity Metrics:

– Tanimoto: standard, scaled, asymmetric

– Euclidean: standard, normalized, weighted, asymmetric

– Optimized for a set of actives

Page 27: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 27

Structural Search Using ChemAxon Tools — April 2005 27

MC(E)S 1. Pairs of molecules

•The largest connected commonsubgraph

•Application: reaction automapping in Marvin

Page 28: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 28

Structural Search Using ChemAxon Tools — April 2005 28

MCS 2. Library MCS

The LibMCS program rapidly creates a hierarchy of MCS-es on a library.

Applications:• Identification of the

most frequently occurring MCS.

• Focused set analysis• Clustering based on

common substructures

Page 29: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 29

Structural Search Using ChemAxon Tools — April 2005 29

Hierarchy calculation performance

Library Library size

Time(s) Clusters Top level clusters

No. of levels

NCI (small molecules, random, diverse sets)

500 6.8 279 14 5

1,000 13.4 440 16 6

5,000 141 851 42 5

d2 inhibitors (medium sized molecules, low diversity)

500 10.8 243 3 7

1,000 25.6 495 9 6

Thrombin inhibitors (medium sized molecules, medium diversity)

1,000 67 528 8 6

3,000 242 1186 12 6

Page 30: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 30

Structural Search Using ChemAxon Tools — April 2005 30

R-group decomposition

JChem is able to identify the ligands of a given scaffold at specified substitution positions:

Query(scaffold) Result

Library R-groupdecomposition

Page 31: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 31

Structural Search Using ChemAxon Tools — April 2005 31

Applications of Chemical Terms

CT

virtual synthesisreaction and synthesis rules

pharmacophore analysispharmacophore definitions

drug designgoal functions

structural searchadvanced query expressions

e.g. in the Cartridge

Page 32: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 32

Structural Search Using ChemAxon Tools — April 2005 32

Chemical Terms

searching match("olefine.mol") && !match("c1ccncc1") && (atomCount(16) == 0) || (mass() < 300);

goal functions inhibitor = inhibitor.mol;

(similarity(inhibitor, pharmacophore_tanimoto) > 0.8) && (similarity(inhibitor, chemical_tanimoto) < 0.5);

filtering (mass() <= 500) &&

(logP() <= 5) &&

(donorCount() <= 5) &&

(acceptorCount() <= 10);

• structure matching functions (describing functional groups, reaction sites, similarity…)

• property calculations (partial charge distribution, pKa, logP, electrophility…, etc)

• arithmetic and logic-operators

Elements of the language

Chemical Terms examples

Page 33: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 33

Structural Search Using ChemAxon Tools — April 2005 33

Chemical Terms

Some available functions

• Structural search (match, matchcount)• Partial charge distribution • pKa, Log P, Log D, major microspecies• Polarizability• Topological Polar Surface Area• Number of rotatable bonds, rings, aromatic rings, etc.• Number of HB donors/acceptors• Exact mass • Arithmetic and logic operators • Extensible: your own Java plugins can be easily added.

• Etc.

Page 34: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 34

Structural Search Using ChemAxon Tools — April 2005 34

Future plans

• More query features (e.g link nodes, ring bond count, unsaturated atom)

• Flexible search options: tautomeric search, ignore bond types, salts, etc.

• Search targets having R-groups (Markush structures)

• etc.

Page 35: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 35

Structural Search Using ChemAxon Tools — April 2005 35

Summary

Structural search provides a useful set of tools for chemists and cheminformaticians.

ChemAxon JChem suite contains a broad range of chemical search facilities and the presented benchmark results illustrate the high performance of JChem search.

The new Chemical Terms language is a beneficial complement to structural searches allowing data mining made easy.

Page 36: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 36

Structural Search Using ChemAxon Tools — April 2005 36

Links

• Home page– www.chemaxon.com

• Forum– www.chemaxon.com/forum

• Animated demos and tutorials– www.chemaxon.com/demos

• Presentations and posters– www.chemaxon.com/conf

Page 37: 1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics

Slide 37

Structural Search Using ChemAxon Tools — April 2005 37

Máramaros köz 3/a Budapest, 1037Hungary

[email protected]

www.chemaxon.com

Thank you for your attention