experiences and adventures with no sql and its applications to cheminformatics data

30
Experiences and adventures with noSQL and its applications to cheminformatics data Valery Tkachenko, Antony Williams, Ken Karapetyan, Alexey Pshenichnov, Mikhail Rybalkin ACS, 248th National Meeting San Francisco, CA August 14 th 2014

Upload: valery-tkachenko

Post on 14-Jul-2015

180 views

Category:

Science


0 download

TRANSCRIPT

Experiences and adventures with noSQL and its applications to cheminformatics data

Valery Tkachenko, Antony Williams, Ken Karapetyan, Alexey Pshenichnov, Mikhail Rybalkin

ACS, 248th National Meeting

San Francisco, CA

August 14th 2014

Chemistry research

Standard designs

Scientific article

Compounds

Reaction

Analytical Data

Text and References

Compounds model

Reaction datamodel

Analytical Data Model

Research Knowledge Model

RSC Databases

RSC CompoundsRSC ReactionsRSC SpectraRSC CrystalsRSC PolymersRSC MaterialsRSC AssaysRSC AlgorithmsRSC Models…and on…

Compounds domain

Reactions domain

Reactions domain

Analytical data domain

Crystallography domain

APIs, endpoints and widgets

Technical view - unification

Chemistry Validation and Standardization Platform

Input pipeline

Output pipeline

Scaling approaches

Federated linked system

Federated repositories

• Privacy

• Security• Authenticity• Safety• Deployment• Access• Etc

SQL database issues

• Scalability

noSQL cartridge performanceQuery Bingo Standalone Module Bingo for SQL Server

Time Difference (how standalone version

works faster)# Smiles Hits

Time (sec)

Hits Time (sec)First 10000 hits All

1C(CN)C(F)F 53684 4.51 14.91 53684 55.56 40.652 [SiH](C)(C)C[SiH2]C 2740 7.59 7.59 2740 26.99 19.413C1(=O)CCCC1C 57397 7.69 25.86 57397 39.59 13.734C1CCCC=1N=C 1652 6.21 6.21 1652 27.62 21.405C(OCC)/C=C\C 672278 1.75 77.88 672278 94.25 16.376C1N(C=NC1)CC 21986 11.92 14.50 21986 43.16 28.667C1(C)=CCC(C)=C1 8982 7.16 7.16 8982 10.11 2.958P(=O)(O)CCCC 28403 7.34 9.25 28403 12.18 2.939C1(I)=CN=CC=C1 3012 8.36 8.36 3012 9.18 0.82

10C1(CC)=CC=CO1 291005 1.48 24.67 291005 39.79 15.1311SC1N=CC=CC=1 130485 2.08 17.92 130485 24.09 6.1712C1N=NSC=1Cl 2348 5.61 5.61 2348 8.33 2.7213C(/C1CC1)=N\N 4769 7.82 7.82 4769 11.43 3.6214N1=CC=CN=C1 1166342 0.82 70.29 1166342 141.77 71.4815C1=NON=C1C 15718 7.30 7.79 15718 34.17 26.3816C(CCC)CC 6541041 0.46 421.05 6541041 664.98 243.9417C1(=NN=CN1)C 864513 1.08 56.80 864513 137.24 80.4518 [Cr](=O)O[Cr]([O-])=O 127 3.36 3.36 127 6.94 3.5819 [BH2-]1[NH2+]C=CCN1 1 2.36 2.36 1 6.70 4.3520P(O)(OC)CC 37364 4.09 7.22 37364 9.95 2.7221C(CN)S(=O)=O 301168 2.99 29.10 301168 41.71 12.6122C(C)COC=O 1236803 0.82 123.43 1236803 149.82 26.3923N1(NNCN1)C 122 4.68 4.68 122 8.13 3.4524C(=C)/C=N\C=N 23916 9.93 12.21 23916 18.50 6.2925CNCCOC 4503264 0.62 262.43 4503264 441.03 178.6026C(CO)CCS 148233 11.49 128.54 148233 34.09 -94.4527C1(S)C=CNC=1 35624 7.49 10.81 35624 23.02 12.2128S([O-])(=O)(=O)CC 17442 7.79 9.11 17442 15.67 6.5629P(OPN)(N)N 75 4.73 4.73 75 15.05 10.32

Thank you

Email: [email protected]

Slides: http://www.slideshare.net/valerytkachenko16