uk national chemical database service: an integration of commercial and public chemistry services to...

53
UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014

Upload: gabriella-glassco

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom

Antony Williams, Valery Tkachenko and Richard Kidd

ACS Dallas

March 2014

Page 2: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

UK Chemical Database Service

• The National Chemical Database Service is for UK academics – see later for Rest of World

Page 3: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Vision for the Service PART 1

• Provide access to databases and services of interest to the academic community to serve their needs. Access to services to include:• Crystallography data – Organic and inorganic

materials• Thermophysical data• Reactions Data including retrosynthetic analysis• Prediction technologies – name generation,

physicochemical parameters, NMR prediction

Page 4: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 5: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Service Rollout

• Many services are hosted in the cloud• Access through login/password, IP

authentication or Shibboleth authentication• Lots of hard work in a very short time – so

much thanks to all of the service providers• More providers stepped up to help –

ChemAxon • Crystallography concern (understatement!)

Page 6: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Feedback from Community

• Converted initial public negativity spike on Twitter pre-release to very positive feedback post-release

• Training required – onsite training sessions organized

• Available Chemicals Directory is big plus!

• Concerns with Retrosynthetic Analysis tool

Page 7: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Usage

• Majority of usage is for crystallography data – previous provider had same bias

• Usage is increasing month-by-month

• Still way-under used and in many cases low awareness

Page 8: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Vision for the Service PART 2

• Response to the call for proposals included our vision for a 21st Century data repository

• At a time of Open Access, Open Data and funding agency requirement to make data public – build a data repository

• Funding is split for licensing content and services (VAST MAJORITY) and some funding for research and development

Page 9: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

An Initial “Vague” Vision Set

• Manage “all” of the chemistry data associated with chemical substances

• Data to be downloadable, reusable, interactive• Build a platform that enables the scientist

• Data storage, validation, standardization and curation

• Collaborative data sharing• Provide data platform that can enable and

enhance publishing of scientific papers

Page 10: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Data Repository

• Registration of chemical compounds• Deposition of chemical syntheses• Addition of analytical data • Integration to electronic notebooks• Rewards and recognition for data sharing• Document processing• Hosting of data as private, embargoed or

public

Page 11: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

What we will deliver for all data

• Simple interfaces for uploading of data

• Embeddable widgets and programming interfaces to utilize in in-house systems, ELNs

• Automated harvesting approaches – sweeping directories for data

• Data validation where possible

Page 12: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Input data pipeline

Deposition Gateway

Staging databases

Compounds

Reactions

Spectra

Materials

Articles / CSSP

Compounds Module

Spectra Module

Reactions Module

Materials Module

TextminingModule

!Module

Web UI for unified depositions

DropBox, Google Drive, SkyDrive, etc

LabTrove and other templated data

Documents

API, FTP, etc

Raw data Validated dataStaging

databases

All databases are sliced by data sources/data

collections and have simple

security model where each data

slice/source is private, public or

embargoed

Page 13: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Compounds upload

• Draw chemicals in the interface (Javascript editors – PC, Mac, Tablets, Phones)

• Drag and drop of compounds

• Automated generate of properties – Formulae, Mw, Mi, physchem properties

• Metadata input forms

• Bulk upload

Page 14: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Depositions Gateway User Interface

Page 15: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Depositions Gateway User Interface

Page 16: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Chemical Validation and Standardization

Page 17: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 18: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Reactions

• Hosting of reaction data – standard “document formats” – full flexibility but limiting – extraction of data from embedded objects

• Encourage template formats – using ELNs for example, community agreed templates

Page 19: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 20: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 21: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 22: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Electronic Notebook Data

• Development work integrating chemistry into the Southampton Labtrove notebook• Stoichiometry table development• Analytical data integration

• “ChemTrove” rolled out to a small test group in January

Page 23: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Micropublishing Syntheses

Page 24: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

ChemSpider SyntheticPages

Page 25: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 26: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Requirements

• Community agreement on acceptable templates for CSSP/Reactions deposition

• Data Model deposition based on mappings between template and CSSP model

• Adoption of Labtrove interface for deposition

Page 27: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

What we will deliver

• Micropublishing platform for submission of • Protocols and Procedures• Reactions• Safety and Hazard data (LATER)

• Template-based submissions of procedures• Matched to ELN submissions• Full details for user submission versus

mapped submission into database

Page 28: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Reaction Deposition/Validation

Page 29: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Reaction Deposition/Validation

Page 30: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Spectral Data

• Support for “structure identification” is a must – “greatest value” for reference and lookup

• Support for data standards primarily – JCAMP, mzML, SPC

• Want to support ASSIGNED data formats

• Hold binary files but prefer standards – WHY?

Page 31: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Raw Spectral Data

Page 32: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

10 years from now…

• Binary file formats generally need original data processing software to deal with them – from Bruker, Agilent, Jeol, Thermo, Waters, blah, blah, blah, blah,…

• While we can store the original raw data files for posterity should we? This has been one focus for data repositories

Page 33: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

This is way more useful

Page 34: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Processed data…

Spectral searching is made possible

Spectral matching is possible

Page 35: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

This is what we really want…

Page 36: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 37: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 38: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Addition of Analytical Data

• Spectral Container is in development using componentized widgets for display

• NIST spectra converted into standardized JCAMP format for deposition - 296,103 spectra deposited

• 10% of remaining NIST spectra need to be curated as there are obvious structure issues

Page 39: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Javascript viewer NMR, MS, IR

Page 40: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Depositions Gateway User Interface

Page 41: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Document processing

Page 42: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Depositions Gateway User Interface

Page 43: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

User Interface Approach

Compounds Reactions Spectra Materials Documents

CompoundsAPI

ReactionsAPI

SpectraAPI

MaterialsAPI

DocumentsAPI

CompoundsWidgets

ReactionsWidgets

SpectraWidgets

MaterialsWidgets

DocumentsWidgets

Data tier

Data access tier

User interface

components tier

Analytical Laboratory application

User interface tier

(examples) Electronic Laboratory Notebook

Paid 3rd party integrations (various platforms – SharePoint, Google, etc)

Chemical Inventory application

Page 44: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 45: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 46: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Page 47: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

User Interface Approach

Compounds Reactions Spectra Materials Documents

CompoundsAPI

ReactionsAPI

SpectraAPI

MaterialsAPI

DocumentsAPI

CompoundsWidgets

ReactionsWidgets

SpectraWidgets

MaterialsWidgets

DocumentsWidgets

Data tier

Data access tier

User interface

components tier

Analytical Laboratory application

User interface tier

(examples) Electronic Laboratory Notebook

Paid 3rd party integrations (various platforms – SharePoint, Google, etc)

Chemical Inventory application

Page 48: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Analytical Chemist

Characterize

Measure

Search

Store

<<include>>

<<include>>

<<include>>

Synthetic Chemist

Search(synthetic procedure)

Document(publish synthetic procedure)

Retrosynthetic analysis

Page 49: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Medicinal Chemist

Search(against database of properties)

Source(find vendor)

Analyse(cluster, dock, screen)

Computational Chemist

Search or Develop algorithm

Store results

Run calculations

Synthesize

Measure activity

Page 50: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Present activities for ACS Fall

• Deposition process development of compounds, reactions and spectral data by end of Spring • FTP, DropBox, Web-upload, ELN integration

• Compounds, Reactions, Spectral data search, display, download

• Data sharing – private, public, collaborative

• Metadata, metadata, metadata standards!

• Open Sourcing Chemical Registry System including CVSP

Page 51: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

UK Chemical Database Service

• The National Chemical Database Service is for UK academics

• What would be necessary to make this available for “Rest of World”, a single institution, an organization?

• It’s not really technology…that’s scale out and can be handled

• It’s negotiation with database providers, pricing, login/authentication, localization?

Page 52: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Acknowledgments

• Jeremy Frey and Simon Coles, University of Southampton

• Will Dichtel and Leah McEwan, Cornell University

• Stuart Chalk, University of North Florida

• Bob Hanson and Bob Lancashire, Jmol and JSpecView

Page 53: UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,

Thank you

Email: [email protected]: 0000-0002-2668-4821 Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams