jchem base chemical database

26
1 Szilárd Dóránt May, 2005 JChem Base chemical database

Upload: kyne

Post on 21-Jan-2016

79 views

Category:

Documents


0 download

DESCRIPTION

Szilárd Dóránt. JChem Base chemical database. May, 2005. Contents. Introduction Structural overview Compatibility Administration JChem tables Fingerprints Structural search. Structure cache Standardization Search options JSP example API examples Performance Future plans. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: JChem Base chemical database

1

Szilárd Dóránt

May, 2005

JChem Base chemical database

Page 2: JChem Base chemical database

2

Slide 2

Jchem Base chemical database — May 2005

Contents

Introduction

Structural overviewCompatibilityAdministrationJChem tablesFingerprintsStructural search

Structure cache

StandardizationSearch optionsJSP exampleAPI examplesPerformanceFuture plans

Page 3: JChem Base chemical database

3

Slide 3

Jchem Base chemical database — May 2005

Introduction

JChem Base provides high performance Java based tools for the storage, search and retrieval of chemical structures and associated data.

These components can be integrated into web-based or standalone applications in association with other ChemAxon tools.

Page 4: JChem Base chemical database

4

Slide 4

Jchem Base chemical database — May 2005

Structural overview

RDBMS (e.g. Oracle, MySQL, etc.) : Storage and security

JDBC driver: Standard interface to the RDBMS

JChem Base API:

•Chemical logic

•Structure cache

Application Web application (JSP)

Web

browser

Page 5: JChem Base chemical database

5

Slide 5

Jchem Base chemical database — May 2005

Compatibility and integration

File formats:• SMILES• MDL molfile (v2000 and v3000)• MDL SDF• RXN• RDF• MRV

Integration:• 100% Java • extensive API• JChem Cartridge for Oracle

Database engines:• Oracle• MySQL• MS SQL Server• PostgreSQL• MS Access• DB2• etc.

Operating systems:• Windows• Linux• Mac OS X• Solaris• etc.

Page 6: JChem Base chemical database

6

Slide 6

Jchem Base chemical database — May 2005

Administration with JChemManager

User interface for• creating tables• import• export• deleting rows• dropping tables

Most functions are also available from command-line.

Page 7: JChem Base chemical database

7

Slide 7

Jchem Base chemical database — May 2005

The property table

The property table stores information about JChem structure tables, including:

• Fingerprint parameters• Custom standardization rules• Recent changes (to optimize cache updates)• Other table options and information• Database-related licence keys

More than one property table can be used, each property table represents a particular JChem environment.

Page 8: JChem Base chemical database

8

Slide 8

Jchem Base chemical database — May 2005

The structure of JChem tables

Column name Explanation

cd_id unique numeric identifier in the table

cd_structure the imported structure in the original format, without modifications (except for the removal of data fields)

cd_smiles the standardized structure in ChemAxon Extended Smiles (cxsmiles) format, used by the search process

cd_formula the formula of the standardized structure

cd_molweight the molecular weight of the standardized structure

cd_hash hash code used for duplicate filtering (PERFECT search)

cd_flags can store row specific option, e.g. overriding the chiral flag

cd_timestamp the date and time of the insertion of the row

cd_fp… fingerprint columns

[user fields] custom data fields can be added by the user

Page 9: JChem Base chemical database

9

Slide 9

Jchem Base chemical database — May 2005

• Chemical Hashed Fingerprints encode structural patterns in bit strings

• If structure A is a substructure of structure B, every bit in B’s fingerprint will be set that is set in structure A’s fingerprint:

• Tanimoto similarity of hashed fingerprints can be used for diversity analysis and similarity search:

Chemical Hashed Fingerprints

YXYX

YXYX

&BitCountBitCountBitCount

&BitCount,Tsim

AB&A

Page 10: JChem Base chemical database

10

Slide 10

Jchem Base chemical database — May 2005

Structural search in database

Two stage method provides optimal performance:

1. Rapid pre-screening reduces the number ofpossible hit candidates

- Chemical Hashed Fingerprints are used forsubstructure and superstructure searches

- Hash code is used for duplicate filtering(usually during compound registration)

2. Graph search algorithm is used to determine the final hit list

Page 11: JChem Base chemical database

11

Slide 11

Jchem Base chemical database — May 2005

Structure Cache

• Contains Fingerprints for screening and ChemAxon Extended SMILES for ABAS

• Instant access to the structures for the search process

• Reduced load on the database server

• Incremental update ensures minimum overhead after changes in the table

• Small memory footprint due to – SMILES compression– Optimized storage technique

• Approximately 100MB memory needed for 1 million typical drug-like structures (using 512 bit long fingerprints)

Page 12: JChem Base chemical database

12

Slide 12

Jchem Base chemical database — May 2005

Standardization

• Default standardization includes:

– Hydrogen removal

– Aromatization

• Custom standardization can be specified for each table by specifying an XML configuration file at table creation or in the “Regenerate” dialog of JChem Manager (jcman)

Page 13: JChem Base chemical database

13

Slide 13

Jchem Base chemical database — May 2005

Custom Standardization Example

afterbefore

Page 14: JChem Base chemical database

14

Slide 14

Jchem Base chemical database — May 2005

Database search options

• Maximum search time / number of hits • SQL SELECT statement for pre-filtering• Ordering of results• Result table• Inverse hit list • Chemical Terms filter constraint

Page 15: JChem Base chemical database

15

Slide 15

Jchem Base chemical database — May 2005

JSP example application

• Open source, customizable

• Features:

– Substructure, Superstructure, Exact and Similarity search

– Molecular Descriptor similarity search with descriptor coloring

– Substructure hit alignment and coloring, inverse hit list

– Chemical Terms filter

– Import / Export

– Export of hits

– Insert / Modify / Delete structures

Page 16: JChem Base chemical database

16

Slide 16

Jchem Base chemical database — May 2005

API example : connecting to a database

ConnectionHandler ch = new chemaxon.jchem.db.ConnectionHandler(); ch.setDriver(“oracle.jdbc.driver.OracleDriver”);ch.setUrl(“jdbc:oracle:thin:@localhost:1521:mydb”);ch.setPropertyTable(“JChemProperties”);ch.setLoginName(“scott”);ch.setPassword("tiger");ch.connect();// the java.sql.Connection object is available if needed:Connection con=ch.getConnection();…// closing the connection:ch.close();

Page 17: JChem Base chemical database

17

Slide 17

Jchem Base chemical database — May 2005

API example : database import

Importer importer = new chemaxon.jchem.db.Importer();importer.setConnectionHandler(conh);importer.setInput(“sample.sdf”);// importer.setInput(is); // alternatively a stream can also be specifiedimporter.setTableName(“SCOTT.STRUCTURES”); importer.setHaltOnError(false);importer.setDuplicateImportAllowed(false); //can filter duplicates

// specifying SDFile field - table field pairs:String fieldPairs = “DB_Field1=SDF_Field1; DB_Field2=SDF_Field2”;importer.setFieldConnections(fieldPairs);int importedCount = importer.importMols();System.out.println( “Imported” + importedCount + “structures” );

Page 18: JChem Base chemical database

18

Slide 18

Jchem Base chemical database — May 2005

API example : database export

Exporter exporter = new chemaxon.jchem.db.Exporter();exporter.setConnectionHandler(conh);

exporter.setTableName(“structures”); //data fields to be exported with the structure:exporter.setFieldList(“cd_id cd_formula name comments”);String fileName=“output.sdf”;OutputStream os=new FileOutputStream(fileName);exporter.setOutputStream(os);exporter.setFormat(“sdf”); int exportedCount = exporter.writeAll();System.out.println(“Exported ” + exportedCount + “structures”);

Page 19: JChem Base chemical database

19

Slide 19

Jchem Base chemical database — May 2005

API example : database search

JChemSearch searcher = new chemaxon.jchem.db.JChemSearch();searcher.setConnectionHandler(ch);searcher.setSearchType(JChemSearch.SUBSTRUCTURE)searcher.setQueryStructure(“c1ccccc1”);searcher.setStructureTable(“SCOTT.STRUCTURES”);// a query that returns cd_id values can be used for prefiltering:Searcher.setFilterQuery(

“SELECT cd_id FROM structures, biodata WHERE ”+ “structures.cd_id = biodata.cd_id AND biodata.toxicity < 0.3” );

searcher.setWaitingForResult(true); // otherwise runs in a separate threadsearcher.setStructureCaching(true); // caching speeds up the searchsearcher.run();// getting the results as cd_id values:int[] results=searcher.getResults();

Page 20: JChem Base chemical database

20

Slide 20

Jchem Base chemical database — May 2005

API example : inserting a structure

// ConnectionHandler, mode, table name and data field names:UpdateHandler uh = new chemaxon.jchem.db.UpdateHandler(

ch, UpdateHandler.INSERT, “structures”, “comment, stock”);uh.setValueForFixColumns(“c1ccccc1”); // the structure// specifying data field values:uh.setStructureValueForAdditionalColumn(1, “some text”); uh.setStructureValueForAdditionalColumn(2, new Double(8.5));uh.setDuplicateFiltering(true); // filtering duplicate structuresint id=uh.execute(true); // getting back the cd_id of the inserted structureif ( id > 0 ) { System.out.println(“Inserted, cd_id value : ” + id);} else { System.out.println(“Already exists with cd_id value : ” + (-id));}// storing update information, the database connection remains open : uh.close();

Page 21: JChem Base chemical database

21

Slide 21

Jchem Base chemical database — May 2005

Performance (1)

Compound registration:

Substructure search in a table of 3 million compounds:

Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i

12min 26s8min 17s200,000

6min 20s4min 11s100,000

45s32s10,000

Duplicates checkedDuplicates not checked

Elapsed timeNumber of compounds

10.749740

1.20

0.9936

0.112

Search time (s)Number of hitsQuery

Page 22: JChem Base chemical database

22

Slide 22

Jchem Base chemical database — May 2005

Performance (2)

Similarity search:Tanimoto >0.8

Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i

1.3336

1.3156

1.524

Search time (s)Number of hitsQuery

Page 23: JChem Base chemical database

23

Slide 23

Jchem Base chemical database — May 2005

Future plans

• Additional layer: JChem Server (later also as grid)

• Structural keys as optional extension to current fingerprints

• Tables for storing query structures

• Tables for storing general (Markush) structures

• Partial clean option for hit alignment

• Installer

• etc.

Page 24: JChem Base chemical database

24

Slide 24

Jchem Base chemical database — May 2005

Summary

ChemAxon’s JChem Base toolkit provides sophisticated methods to deal with chemical structures and associated data.

The usage of fingerprints and structure cache provide high search performance.

Page 25: JChem Base chemical database

25

Slide 25

Jchem Base chemical database — May 2005

Links

• JChem home page:– www.jchem.com

• Live demos:– www.jchem.com/examples

• API documentation:– www.jchem.com/doc/api

• Brochure:– www.chemaxon.com/brochures/JChemBase.pdf

Page 26: JChem Base chemical database

26

Slide 26

Jchem Base chemical database — May 2005

Máramaros köz 3/a Budapest, 1037Hungary

[email protected]

www.chemaxon.com

Thank you for your attention