1 szabolcs csepregi*, szilárd dóránt, nóra máté, miklós vargyas, péter kovács, györgy...
TRANSCRIPT
1
Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia
First presented at Applications of Cheminformatics and Chemical Modelling to Drug Discovery · 8-19 Nov 2004 Updated. April, 2005
Structural Search Using ChemAxon Tools
Slide 2
Structural Search Using ChemAxon Tools — April 2005 2
Contents
Structural search in cheminformatics
The JChem suite of tools
Structural search in JChem• Interfaces
• Database solutions: JChemBase, Cartridge
• Standardization
• Search features
MCS/MCES and Library MCS
R-group decomposition
The Chemical Terms language
Future plans
All examples generated by ChemAxon’s Marvin
Slide 3
Structural Search Using ChemAxon Tools — April 2005 3
Structural search in cheminformatics
A few examples to highlight the diversity of applications. :
•Compound registration – duplicate checking
•Database search e.g. imidazole derivatives
•Pharmacophoric group identification (JChem Screen, JKlustor)
•Functional group identification
•Cleavage bond identification (JChem Fragmenter)
•Virtual reaction processing (JChem Reactor)
•Standardization (canonicalization of structures, JChem Standardizer)
•Toxical fragment identification (superstructure search)
Slide 4
Structural Search Using ChemAxon Tools — April 2005 4
Search types in JChem
• ABAS(Atom By Atom Search) or structural search:– Exact– Substructure– Superstructure– MC(E)S – maximum common (edge) substructure– R-group decomposition (identify ligands of a given scaffold)
• Similarity search:– Different Descriptors– Different Metrics
Slide 5
Structural Search Using ChemAxon Tools — April 2005 5
ABAS search interfaces
• JSP(Java Server Pages): web GUI for database– Similarity & structural search– Substructure highlighting– Additional constraints– Insert, modify, delete
• Command line utility: jcsearch: for files and DB
• Java API– isMatching() – Only to check matching– findFirst(), findNext() Enumerate all– findAll() possible matchings
• Cartridge: access all functionality from SQL
• Chemical Terms
Slide 6
Structural Search Using ChemAxon Tools — April 2005 6
ABAS options
• General options:• Order sensitive hits e.g.• Pre-assignment of query and target atoms• Consider stereo or not, absolute stereo (ignore chiral flag)• Timeout limit• Exact charge/radical/isotope/query features/bond/stereo matching• Double bond stereo: no check/marked/all double bonds• Chemical Terms filter expression• etc
• Database search:• Maximum search time/number of hits• Additional SQL SELECT expression for prefiltering• Output table• Reverse hits mode
Slide 7
Structural Search Using ChemAxon Tools — April 2005 7
Structural search in database
• Search: two stage method:– Rapid pre-screening based on chemical hashed fingerprints– ABAS (isMatching)
• Duplicate check at compound registration:– Hash code: primary filter– ABAS (isMatching)
• Standardization• Caching of structures and fingerprints allow top performance
Slide 8
Structural Search Using ChemAxon Tools — April 2005 8
Import with JChem Base Manager
Slide 9
Structural Search Using ChemAxon Tools — April 2005 9
JChem Base molecular file formats and integration
Import formats:• SMILES• MDL molfile (v2000 and v3000)• MDL SDF• RXN• RDF• MRV
Database engines:• Oracle• MySQL• MS SQL Server• PostgreSQL• MS Access• DB2• etc.
• CML• PDB• Sybyl molfile• XYZ• Gaussian cube• Image formats for export(JPG, PNG, SVG)
OS: any operating systems running java
• Windows• Linux• Mac OS X• Solaris• etc.
Slide 10
Structural Search Using ChemAxon Tools — April 2005 10
JChem Base performance (1)
Compound registration:
Substructure search in a table of 3 million compounds:
Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i
Number of compounds
Elapsed time
Duplicates not checked Duplicates checked
10,000 32s 45s
100,000 4min 11s 6min 20s
200,000 8min 17s 12min 26s
10.749740
1.20
0.9936
0.112
Search time (s)Number of hitsQuery
Slide 11
Structural Search Using ChemAxon Tools — April 2005 11
JChem Base performance (2)
Similarity search:Tanimoto >0.8
Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i
1.3336
1.3156
1.524
Search time (s)Number of hitsQuery
Slide 12
Structural Search Using ChemAxon Tools — April 2005 12
JChem Cartridge for OracleJChem Cartridge for Oracle
Slide 13
Structural Search Using ChemAxon Tools — April 2005 13
JChem Cartridge for Oracle
Oracle is extended to support chemical database operations using the JChem Cartridge for Oracle
Examples:
Substructure search displaying ID, SMILES codes, and molweight:
SELECT cd_id, cd_smiles, cd_molweight FROM my_structures
WHERE jc_contains(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') = 1;
Similarity search filtered with predicted pKa values, which displays predicted logP and logD values:
SELECT cd_id, jc_logP(cd_smiles), jc_logD(cd_smiles, 7.4) FROM my_structures
WHERE jc_tanimoto(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') >= 0.8
AND jc_pKa(cd_smiles, 'acidic', 1) < 4;
JChem Cartridge for Oracle
Slide 14
Structural Search Using ChemAxon Tools — April 2005 14
JChem Cartridge for Oracle
Chemical Terms examples:• Number of compounds in table “nci_10m” containing benzene and conforming the Lipinski rule of 5:
SELECT count(*) FROM nci_10m WHERE jc_compare(structure,'c1ccccc1','sep=! t:s!ctFilter:(mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)') = 1
• Compounds in table “nci_10m” containing 3-bromoindole and restricting TPSA, molecular weight, rotatable and aromatic ring counts:
SELECT cd_structure FROM nci_10m WHERE jc_compare(structure,'Brc1cnc2ccccc12','sep=! t:s!ctFilter:(PSA() <= 200) && (rotatableBondCount() <= 10) && (mass() <= 500) && (aromaticRingCount() <= 4) ') = 1
• New interface to ChemAxon API features from SQL — accessible from non-java programs as well.
• Enhanced performance of certain SQL queries.
JChem Cartridge for Oracle
Slide 15
Structural Search Using ChemAxon Tools — April 2005 15
Query features 1. Atomic features
• Query atom types: any, hetero, list, not list
• Pseudo atoms e.g. “Resin”
• Explicit lone pairs (matches to implied lone pairs as well.
• Charge, isotope, radical
• Query properties:
Symbol Description
H<n> Total hydrogen count
a Aromatic
A Aliphatic
R<n> Ring count in SSSR
r<n> Ring size in SSSR
v<n> valence
X<n> Connectivity
Slide 16
Structural Search Using ChemAxon Tools — April 2005 16
Query features 2. Atomic SMARTS features
• SMARTS atoms:
• Additional query properties:
• Example:
Carbonyl C, but not amide
Symbol Description
D<n> Degree
h<n> Implicit H count
& ; , ! Logical operators
$(<smarts>) Recursive smarts
+0, -0 Zero charge
Slide 17
Structural Search Using ChemAxon Tools — April 2005 17
Query features 3. Bond features & components
• Query bond types: Any, single or double, single or aromatic, double or aromatic
• Bond topology: chain/ring
• Smarts bonds
• Component level grouping
Symbol Description
- = # Single, double, triple
: aromatic
& , ; ! Logical operators
@ Ring bond
/ \ /? \? Directional bond (cis/trans)
Symbol Description
(C.C) Same component
(C).(C) Different component
C.C No component restrictions
Slide 18
Structural Search Using ChemAxon Tools — April 2005 18
Stereo searching 1. Double bonds
• Levels of check:– All– Only marked double bonds
(MDL: stereo care flag)
– None
Not cis
Not trans
Cis or trans
(unknown)
Trans
Cis
MeaningDepiction
Slide 19
Structural Search Using ChemAxon Tools — April 2005 19
Stereo searching 2. Tetrahedral chirality
• Stereo bond types:
• Relative stereo configuration• Chiral flag model• Enhanced stereo representation: AND<n>, OR<n>, ABS groups
Up or downDownUp
Slide 20
Structural Search Using ChemAxon Tools — April 2005 20
Reaction search
• Reactants, agents, products
• Transformation recognition (mapping)
• Stereospecific reactions (inversion, retention)
• Reactant grouping
Slide 21
Structural Search Using ChemAxon Tools — April 2005 21
R-group search
• Scaffold, R-group definitions
• Monovalent, divalent R-groups
• R-logic
•Occurrence
•If-then
•RestH
Slide 22
Structural Search Using ChemAxon Tools — April 2005 22
Hydrogens
• H representations:– Explicit– Implicit– Query H count (total or implicit)
• Example:
Considered in ABAS
Explicit H Implicit H Query H count
Query
Target
Target
Query
Slide 23
Structural Search Using ChemAxon Tools — April 2005 23
Standardization
• Explicit hydrogens removal
• Aromatic bonds
• Mesomers
• Tautomers
• Counterions
• Stereo representation
Slide 24
Structural Search Using ChemAxon Tools — April 2005 24
Standardization - Aromaticity
• RepresentationsKekulé Aromatic
• Example: The two Kekulé representations below don’t match
• Two options available: ChemAxon & Daylight aromatization
Slide 25
Structural Search Using ChemAxon Tools — April 2005 25
Standardization Example
afterafterbeforebefore
Slide 26
Structural Search Using ChemAxon Tools — April 2005 26
Similarity search
• Descriptors:– Chemical hashed fingerprint
– 2D (topological) pharmacophore fingerprint
– BCUT
– Structural keys
– Hypothesis fingerprints: minimum, average
• Dissimilarity Metrics:
– Tanimoto: standard, scaled, asymmetric
– Euclidean: standard, normalized, weighted, asymmetric
– Optimized for a set of actives
Slide 27
Structural Search Using ChemAxon Tools — April 2005 27
MC(E)S 1. Pairs of molecules
•The largest connected commonsubgraph
•Application: reaction automapping in Marvin
Slide 28
Structural Search Using ChemAxon Tools — April 2005 28
MCS 2. Library MCS
The LibMCS program rapidly creates a hierarchy of MCS-es on a library.
Applications:• Identification of the
most frequently occurring MCS.
• Focused set analysis• Clustering based on
common substructures
Slide 29
Structural Search Using ChemAxon Tools — April 2005 29
Hierarchy calculation performance
Library Library size
Time(s) Clusters Top level clusters
No. of levels
NCI (small molecules, random, diverse sets)
500 6.8 279 14 5
1,000 13.4 440 16 6
5,000 141 851 42 5
d2 inhibitors (medium sized molecules, low diversity)
500 10.8 243 3 7
1,000 25.6 495 9 6
Thrombin inhibitors (medium sized molecules, medium diversity)
1,000 67 528 8 6
3,000 242 1186 12 6
Slide 30
Structural Search Using ChemAxon Tools — April 2005 30
R-group decomposition
JChem is able to identify the ligands of a given scaffold at specified substitution positions:
Query(scaffold) Result
Library R-groupdecomposition
Slide 31
Structural Search Using ChemAxon Tools — April 2005 31
Applications of Chemical Terms
CT
virtual synthesisreaction and synthesis rules
pharmacophore analysispharmacophore definitions
drug designgoal functions
structural searchadvanced query expressions
e.g. in the Cartridge
Slide 32
Structural Search Using ChemAxon Tools — April 2005 32
Chemical Terms
searching match("olefine.mol") && !match("c1ccncc1") && (atomCount(16) == 0) || (mass() < 300);
goal functions inhibitor = inhibitor.mol;
(similarity(inhibitor, pharmacophore_tanimoto) > 0.8) && (similarity(inhibitor, chemical_tanimoto) < 0.5);
filtering (mass() <= 500) &&
(logP() <= 5) &&
(donorCount() <= 5) &&
(acceptorCount() <= 10);
• structure matching functions (describing functional groups, reaction sites, similarity…)
• property calculations (partial charge distribution, pKa, logP, electrophility…, etc)
• arithmetic and logic-operators
Elements of the language
Chemical Terms examples
Slide 33
Structural Search Using ChemAxon Tools — April 2005 33
Chemical Terms
Some available functions
• Structural search (match, matchcount)• Partial charge distribution • pKa, Log P, Log D, major microspecies• Polarizability• Topological Polar Surface Area• Number of rotatable bonds, rings, aromatic rings, etc.• Number of HB donors/acceptors• Exact mass • Arithmetic and logic operators • Extensible: your own Java plugins can be easily added.
• Etc.
Slide 34
Structural Search Using ChemAxon Tools — April 2005 34
Future plans
• More query features (e.g link nodes, ring bond count, unsaturated atom)
• Flexible search options: tautomeric search, ignore bond types, salts, etc.
• Search targets having R-groups (Markush structures)
• etc.
Slide 35
Structural Search Using ChemAxon Tools — April 2005 35
Summary
Structural search provides a useful set of tools for chemists and cheminformaticians.
ChemAxon JChem suite contains a broad range of chemical search facilities and the presented benchmark results illustrate the high performance of JChem search.
The new Chemical Terms language is a beneficial complement to structural searches allowing data mining made easy.
Slide 36
Structural Search Using ChemAxon Tools — April 2005 36
Links
• Home page– www.chemaxon.com
• Forum– www.chemaxon.com/forum
• Animated demos and tutorials– www.chemaxon.com/demos
• Presentations and posters– www.chemaxon.com/conf
Slide 37
Structural Search Using ChemAxon Tools — April 2005 37
Máramaros köz 3/a Budapest, 1037Hungary
www.chemaxon.com
Thank you for your attention