presentation of chemspider at pubchem public meeting

29
ChemSpider ChemSpider Creating a Structure Centric Creating a Structure Centric Community for Chemists Community for Chemists Antony Williams Antony Williams [email protected] [email protected]

Upload: orcid-0000-0002-2668-4821

Post on 10-May-2015

943 views

Category:

Technology


2 download

DESCRIPTION

An overview of ChemSpider given at the PubChem Public Advisory Board Meeting in 2007

TRANSCRIPT

Page 1: Presentation of ChemSPider at PubChem Public Meeting

ChemSpiderChemSpider

Creating a Structure Centric Creating a Structure Centric Community for ChemistsCommunity for Chemists

Antony WilliamsAntony Williams

[email protected]@chemspider.com

Page 2: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

The ChemSpider MissionThe ChemSpider Mission

Build a structure centric community for Build a structure centric community for chemists by:chemists by: Providing an environment for structure drawing, Providing an environment for structure drawing,

manipulation, visualization, modeling, databasing manipulation, visualization, modeling, databasing and searchingand searching

Providing methods by which to deposit, curate and Providing methods by which to deposit, curate and enhance data associated with chemical structuresenhance data associated with chemical structures

Providing structure-based access to federated Providing structure-based access to federated Chemistry databases representing chemical Chemistry databases representing chemical vendors, literature, online data, patents and other vendors, literature, online data, patents and other forms of Chemistry data forms of Chemistry data

Page 3: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Execution of the Mission Execution of the Mission September 2007September 2007

An online database of nearly 20 million structures An online database of nearly 20 million structures (should be >21 million following the latest (should be >21 million following the latest depositions)depositions)

Systems in place for: Systems in place for: Single structure and data collection depositions (in beta Single structure and data collection depositions (in beta

testing)testing) Association of analytical data with structuresAssociation of analytical data with structures Ability to curate data for each individual recordAbility to curate data for each individual record

Indexing of and Integration to:Indexing of and Integration to: Over >80 individual databasesOver >80 individual databases Patents from the US and European Patent offices Patents from the US and European Patent offices

(SureChem)(SureChem)

Page 4: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Execution of the Mission Execution of the Mission September 2007September 2007

Text-based searching of over 50,000 Open Text-based searching of over 50,000 Open Access articles (110,000 have been Access articles (110,000 have been indexed but not online yet. Structure indexed but not online yet. Structure searching is coming)searching is coming)

Over 100,000 identifiers curatedOver 100,000 identifiers curated Average of 1200 unique users per dayAverage of 1200 unique users per day A series of web services for people to A series of web services for people to

access a number of our capabilitiesaccess a number of our capabilities Multiple collaborations now in placeMultiple collaborations now in place

Page 5: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Flexible Boolean SearchingFlexible Boolean Searching

Page 6: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Flexible Boolean SearchingFlexible Boolean Searching

Page 7: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Flexible Boolean SearchingFlexible Boolean Searching

Page 8: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Search result: 49 hits in 0.8 Search result: 49 hits in 0.8 secondsseconds

Page 9: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Integrated Visualization ToolsIntegrated Visualization Tools

Page 10: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Integrated Analytical Data Integrated Analytical Data ManagementManagement

for Public Domain Datafor Public Domain Data

Page 11: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Integrated Access to Open Access Integrated Access to Open Access LiteratureLiterature

Text-based searching of over 50,000 Open Access Chemistry Articles

Page 12: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

External Integrations - GoogleExternal Integrations - Google

Search Across Search Across Google Using Google Using InChI stringInChI string

Page 13: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

External Integrations – PatentsExternal Integrations – PatentsSurechem PortalSurechem Portal

Page 14: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

How do people generally use How do people generally use ChemSpider?ChemSpider?

Searching for chemical structures, in rank Searching for chemical structures, in rank order, via:order, via: Trade names, synonyms and registry numbers, . Trade names, synonyms and registry numbers, . Structure identifiers such as SMILES or InChIStructure identifiers such as SMILES or InChI Intrinsic properties: commonly mass-based Intrinsic properties: commonly mass-based

searches executed by mass spectrometristssearches executed by mass spectrometrists Systematic names: IUPAC or CAS Index nameSystematic names: IUPAC or CAS Index name

Structure-based searching of PatentsStructure-based searching of Patents Text-based searching of Open Access articlesText-based searching of Open Access articles Generation of physicochemical propertiesGeneration of physicochemical properties

Page 15: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Curators - An Active Curators - An Active CommunityCommunity

Active curation is happening everyday nowActive curation is happening everyday now Roboticized curation is underway – scripting to strip Roboticized curation is underway – scripting to strip

obvious errorsobvious errors Visit the blog posts for detail Visit the blog posts for detail

(www.chemspider.com/blog)(www.chemspider.com/blog)

Page 16: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Quality is a Major IssueQuality is a Major Issue

Pubchem structure-identifier pairs are Pubchem structure-identifier pairs are proliferatingproliferating

Care is needed or at least cleansing of the Care is needed or at least cleansing of the datadata

Page 17: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Quality is a Major IssueQuality is a Major Issue

Other DatabasesOther Databases……

1-Butyl alcohol , 1-Hydroxybutane , 1-butanol , 1-Butyl alcohol , 1-Hydroxybutane , 1-butanol , Alcool butylique, Butan-1-ol, Butanol-1, Butanolen, Alcool butylique, Butan-1-ol, Butanol-1, Butanolen, Butanolo, Butyl alcohol, Butyl hydroxide, Butanolo, Butyl alcohol, Butyl hydroxide, Butyl Butyl orthotitanate, Butyl titanate, Butyl titanate orthotitanate, Butyl titanate, Butyl titanate (IV), Butyl zirconate(IV), Butyl zirconate,, Butylowy alkohol, Butyric Butylowy alkohol, Butyric alcohol, Butyric or normal primary butyl alcohol, alcohol, Butyric or normal primary butyl alcohol, Hemostyp, Methylolpropane, Propylcarbinol, Hemostyp, Methylolpropane, Propylcarbinol, Propylmethanol, Propylmethanol, Tetrabutoxytitanium, Tetrabutoxytitanium, Tetrabutoxyzirconium, Tetrabutyl Tetrabutoxyzirconium, Tetrabutyl orthotitanate, Tetrabutyl titanate, Tetrabutyl orthotitanate, Tetrabutyl titanate, Tetrabutyl zirconate, Titanium butoxide (Ti), Titanium zirconate, Titanium butoxide (Ti), Titanium tetrabutoxide, Titanium tetrabutylate, Zirconic tetrabutoxide, Titanium tetrabutylate, Zirconic acid butyl ester, Zirconium tetrabutoxideacid butyl ester, Zirconium tetrabutoxide, n-, n-Butan-1-ol, n-Butanol, n-Butanolbutanolen, n-Butyl Butan-1-ol, n-Butanol, n-Butanolbutanolen, n-Butyl alcohol, n-Butylalkohol, propyl carbinolalcohol, n-Butylalkohol, propyl carbinol

Page 18: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Quality is a Major IssueQuality is a Major Issue

Page 19: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Curating on ChemSpiderCurating on ChemSpider

Page 20: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Curating PubChem DataCurating PubChem Data

The PubChem team is not resourced to The PubChem team is not resourced to curate the datacurate the data

The data should be curatedThe data should be curated ChemSpider has created an environment ChemSpider has created an environment

to validate and curate the datato validate and curate the data Curation is underwayCuration is underway We will feed back curated data to We will feed back curated data to

PubChem on an ongoing basisPubChem on an ongoing basis

Page 21: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

ChemSpider and PubChemChemSpider and PubChem

ChemSpider will deposit our entire ChemSpider will deposit our entire database of structures to PubChem database of structures to PubChem following our latest deposition and following our latest deposition and deduplication cycle (within a month we deduplication cycle (within a month we hope)hope)

ChemSpider is curating data and will ChemSpider is curating data and will submit back to PubChemsubmit back to PubChem

At 9:13am today:At 9:13am today:

Page 22: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Online Deposition System in Online Deposition System in BetaBeta

Page 23: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Provide Tools for DevelopersProvide Tools for Developers

Page 24: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Provide Tools for DevelopersProvide Tools for Developers

Page 25: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Targets for 2007Targets for 2007

End of year intentions for ChemSpider includeEnd of year intentions for ChemSpider include Adding more databases to the index Adding more databases to the index Enhance integrations to other structure drawing packagesEnhance integrations to other structure drawing packages Additional property prediction algorithms from partners. Additional property prediction algorithms from partners.

More predicted properties to go online shortly. Calculations More predicted properties to go online shortly. Calculations for >20 million structures is time-consuming!for >20 million structures is time-consuming!

Expand analytical data handling – presently working with a Expand analytical data handling – presently working with a publisher regarding hosting the data for their publicationspublisher regarding hosting the data for their publications

Enhance the Patent integrationEnhance the Patent integration Expand the Open Access article index to >250,000 articlesExpand the Open Access article index to >250,000 articles Make Medline structure searchable by text miningMake Medline structure searchable by text mining

Page 26: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Targets for End of 2007Targets for End of 2007

Source funding to continue the ChemSpider Source funding to continue the ChemSpider projectproject

Deliver on projects with collaborators:Deliver on projects with collaborators: ChemModLab with NCSU and NISS for QSAR-based ChemModLab with NCSU and NISS for QSAR-based

virtual screening. ZINC is 4.6 million commercially virtual screening. ZINC is 4.6 million commercially available compounds. ChemSpider has about 10 available compounds. ChemSpider has about 10 million commercially available compounds – 3D million commercially available compounds – 3D optimized structures will be generated shortlyoptimized structures will be generated shortly

Simbiosys has developed groundbreaking Simbiosys has developed groundbreaking technologies in terms of the speed of virtual technologies in terms of the speed of virtual screening by docking against targets. ChemSpider screening by docking against targets. ChemSpider ligands will be used in virtual screensligands will be used in virtual screens

Connectivities between ChemSpider and Chembench Connectivities between ChemSpider and Chembench (Alex Tropsha at UNC Chapel Hill) will be enabled (Alex Tropsha at UNC Chapel Hill) will be enabled

Page 27: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

Making the Web Structure Making the Web Structure SearchableSearchable

The InChIString and InChIKey will The InChIString and InChIKey will helphelp make make the web structure searchablethe web structure searchable

InChIStrings are not indexed correctly and InChIStrings are not indexed correctly and the shift is to the InChIKeythe shift is to the InChIKey

““Someone” must host the InChIKey look up Someone” must host the InChIKey look up table relating to InChIStringstable relating to InChIStrings

““Someone” must provide scalable online tools Someone” must provide scalable online tools for the capture, databasing and searching of for the capture, databasing and searching of InChIsInChIs

InChIs do NOT make the web substructure or InChIs do NOT make the web substructure or similarity of structure searchable. An index similarity of structure searchable. An index will.will.

Page 28: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

ConclusionConclusion

ChemSpider is ChemSpider is successfully successfully building a building a structure centric community for chemistsstructure centric community for chemists

Over 1200 chemists per day utilize Over 1200 chemists per day utilize ChemSpider to help answer questions and ChemSpider to help answer questions and solve their problemssolve their problems

A well-defined path forward to enhance A well-defined path forward to enhance the service has been definedthe service has been defined

Page 29: Presentation of ChemSPider at PubChem Public Meeting

Building a Structure Centric Community for Chemists

AcknowledgmentsAcknowledgments

Thousands of users for their feedback and Thousands of users for their feedback and ongoing encouragementongoing encouragement

The “naysayers” – criticism, when taken The “naysayers” – criticism, when taken constructively, can drive creative actionsconstructively, can drive creative actions

Our advisory group of scientists, Our advisory group of scientists, specialists and friendsspecialists and friends

The bloggers coming to the ChemSpider The bloggers coming to the ChemSpider Blog and ChemSpider NewsBlog and ChemSpider News www.chemspider.com/blogwww.chemspider.com/blog www.chemspider.com/newswww.chemspider.com/news