recent developments in chemical database searching gary wiggins e-mail: [email protected] indiana...
TRANSCRIPT
![Page 1: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/1.jpg)
Recent Developments in Chemical Database Searching
Gary Wiggins E-mail: [email protected]
Indiana UniversitySchool of Informatics
ACS Wabash Valley SectionNovember 4, 2004
![Page 2: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/2.jpg)
The Current Database Environment
• Interdisciplinary science• Consolidation of the Scientific-Technical-
Medical (STM) publishing world• Databases covering different formats:
encyclopedias, treatises,review serials• Influence of the Web
– Move to open access journals– Different cultures in the chemistry publishing
environment compared to that in biology
![Page 3: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/3.jpg)
Huge Size of the Chemical Literature
• ~ 50 million chemical substances
• ~ 6 million reagents
• ~ 7 million published reactions
• ~16,000 protein crystal structures
• ~250,000 small molecule x-ray structures
--Robert Glen and Susan Aldridge (2002)
![Page 4: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/4.jpg)
Growth of Articles in CA
Year Articles Abstracted
1907 7,994
1945 22,824
1960 104,484
1970 230,902
1980 407,342
1990 394,945
2000 573,469 Source: http://www.cas.org/EO/casstats.pdf
![Page 5: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/5.jpg)
Vendors and Publishers
• Partnership between commercial vendors and abstracting/indexing services (and to some extent with journal publishers)– Most activity in online searching started in the
1970s– Comparatively little change in the vendors’
search systems until relatively recently• Aggregation of databases• Cross-file searching• Command-driven access
![Page 6: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/6.jpg)
Vendors of Chemical Databases
• STN International (http://info.cas.org/stn.html)– SciFinder and SciFinder Scholar (http://www.cas.org/)
• ISI Thomson (http://www.isinet.com)• QuestelOrbit (http://www.questel.orbit.com/index.htm)
– Merged Markush Service• Dialog (http://www.dialog.com/)• MDL (http://www.mdl.com/)• US National Library of Medicine (http://www.nlm.nih.gov/)• Ovid Technologies (http://www.ovid.com/)• CSA (Cambridge Scientific Abstracts) (http://www.csa.com/)• Chemical Information System (http://www.nisc.com/cis/qcis1.asp)• knovel (http://www.knovel.com/)• Technical Database Services (http://www.tdsonline.com/)
![Page 7: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/7.jpg)
STN International
• Partnership among Chemical Abstracts Service, FIZ Chemie, and the Japan Science and Technology Corporation
• Has over 200 STM databases– STN Database Summary Sheets: http://
info.cas.org/ONLINE/DBSS/dbsslist.html– Includes some databases also available free
through other venues (e.g., Medline, GenBank)
![Page 8: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/8.jpg)
Features in Commercial Systems
• Special Boolean operators (proximity, adjacency, etc.)• Truncation (wild cards and left-hand or right-hand
truncation)• Controlled vocabulary tools (MeSH, CAS’s Index
Guide, CA Lexicon)• Classification of the documents
– PACS (Physics and Astronomy Classification Scheme)– CA Sections/Subsections
• Structure searching (usually range from exact to full substructure search)
• Numeric and other data that is searchable• Data analysis tools• Current awareness options
![Page 9: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/9.jpg)
Command Language Systems
• Allow field-directed searches
• Incorporate sophisticated Boolean relationships– AND, OR, NOT– Adjacency, Proximity, Logical linking to the
same field or sub-field of a record• Numbers of intervening words can be specified
• Drawback: User must learn the commands
![Page 10: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/10.jpg)
User-Oriented Software
• Front-end systems to mask command language– STN’s SciFinder (&SF Scholar)– STN on the Web, STNEasy, STN Express– CrossFire Commander and MDL
DiscoveryGate– Questel-ORBIT’s QWeb and Imagination
![Page 11: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/11.jpg)
SciFinder and SciFinder Scholar
• Access to the CAplus, Registry, CHEMCATS, CHEMLIST files, plus Medline (1957-) and links to the Web
• Easy structure searching capabilities + 3D visualization (with ViewerLite)
• Integrated with ChemPort for easy access to the primary journal and other literature
• Citation searching (from 1998)• SFS pricing tied to number of seats• Industrial version permits customized usage and
other features not found in SFS
![Page 12: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/12.jpg)
Limitations in SciFinder Scholar
• OK for organic compounds, author and orientational topic searches, many reactions
• PROBLEMATIC for some organometallics and most inorganics
• NOT REALLY FEASIBLE (use STN) for: – polymers– most sequences/subsequences– all materials– some reactions (AutoFix imposes limits!)– "comprehensive" topic searches
--Bert Zass, CHMINF-L, 11/4/2004
![Page 13: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/13.jpg)
Saving Records in SFS
• Download limit of 100 records
![Page 14: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/14.jpg)
DiscoveryGate for Academics• CrossFire Beilstein • CrossFire Gmelin • MDL® Available Chemicals Directory • MDL® Screening Compounds Directory • MDL® Reference Library of Synthetic Methodology • MDL® Solid-Phase Organic Reactions • ORGSYN (Organic Syntheses) Database • Encyclopedia of Reagents for Organic Synthesis • Comprehensive Organic Functional Group Transformations • Comprehensive Asymmetric Catalysis • MDL® Comprehensive Medicinal Chemistry • MDL® Drug Data Report • MDL® Metabolite Database • MDL® Toxicity Database • ChemInform Reaction Library • Current Synthetic Methodology • Derwent Journal of Synthetic Methods • National Cancer Institute Database• http://www.mdl.com/solutions/solutions_for/academics/dg_academics.jsp
![Page 15: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/15.jpg)
Main Chemical Databases
• Chemical Abstracts
• Beilstein/Gmelin
• Cambridge Structural Database
• Protein Data Bank
• Many other relevant databases
![Page 16: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/16.jpg)
CAS DBs: CA File
• CA File, a bibliographic database covering journal articles (from ~8000 journals), technical reports, conference proceedings, dissertations, patents and other literature
• 1907 to the present; full indexing has been added for all records retrospectively
• Linked through the Registry Number to compound data
• CAplus File, includes CA File data plus e-journals, some preprints, and all articles from ~1500 key chemical journals within one week of receipt
![Page 17: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/17.jpg)
Relative Contributions of Literature Types to CA
Used with the permission of Chemical Abstracts Service (CAS), a division of the American Chemical Society, from:http://www.cas.org/casdb.html
![Page 18: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/18.jpg)
Old References Recently Added to CA Database
The boiling-point curve for mixtures of ethyl alcohol and water. Noyes, William A.; Warfel, R. R. Rose Polytechnic Institute, Terre Haute, Journal of the American Chemical Society (1901), 23(7), 463-8. CODEN: JACSAT ISSN: 0002-7863. Journal written in English. CAN 0:1311 AN 1906:1311 CAPLUS (Copyright 2004 ACS on SciFinder (R))
AbstractIn the determination with small amounts of alcohol, the readings of the
thermometer were taken when the vapors first entered the condenser, as after boiling for a few minutes a relatively large proportion of the alcohol present would be found in the upper layers and in the condenser. The thermometer under these conditions registered about 0.3 higher. An examination of the table and curve revealed that the minimum boiling point is for alcohol of 96% by weight. The curve was steeper on the side toward absolute alcohol. Alcohol of 90.7% had the same boiling point as absolute alcohol.
![Page 19: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/19.jpg)
Special Fields in the CA File
• In addition to the standard bibliographic citation data, have:– Controlled Terms (CT)– Classification Codes (CC: the 80 section codes
into which the content of the paper CA is divided: http://www.cas.org/PRINTED/sects.html)
– Document type (DT)– Language Code (LA)– Role (RL)
![Page 20: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/20.jpg)
CAS DBs: Registry File
• “Authority” file that lets indexers and searchers definitively identify a substance as new or find a previous entry
• Contains all types of chemical substances, including biomolecules
• Best file for chemical names• Many physical properties being added• Linked to CA and other files through the
Registry Number (RN)
![Page 21: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/21.jpg)
CAS Registry Number
• Serves as the accession number in the Registry File
• RN has no meaning– Example: Isatin is 91-56-5
![Page 22: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/22.jpg)
Registry File Contents
• Includes synonyms, molecular formulas, alloy composition tables, classes for polymers, nucleic acid and protein sequences, ring analysis data, and structure diagrams
• Also: experimental and calculated property data from various sources as well as super roles and document type information from CAplus
![Page 23: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/23.jpg)
Registry File Contents
• 72,297,557substances have a RN in the Registry File as of 9/26/2004
• All substances in CAS files plus others• Many physical constants now added to the
records, most of them calculated– Lipinski Rule of Five values– BP, MP, Density, Optical Rotatory Power,
Refractive Index– Data for 3D visualization
![Page 24: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/24.jpg)
Traumatic Acid Registry File Record
![Page 25: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/25.jpg)
Traumatic Acid: SFS eScience
![Page 26: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/26.jpg)
Size of the Registry File
Date Sunday, 9/26/2004
Count 24,205,177 organic and inorganic substances
48,092,380 sequences
CAS Registry Number 751481-24-0 is the most recent CAS Registry Number
![Page 27: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/27.jpg)
CAS DBs: CASReact
• Derived from journal and patent documents from 1840 to date
• Contains both single-step and multistep reactions
• Structure searchable
• Contains yield data, reaction conditions, etc.
![Page 28: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/28.jpg)
CAS Databases: Other
• CHEMCATS--information about commercially available chemicals and their worldwide suppliers
• CHEMLIST--contains chemical substances on national inventories
• MARPAT--more than 500,000 Markush structure records for patents found in the CA File with patent publication year 1988 to the present
• TOXCENTER--covers the pharmacological, biochemical, physiological, and toxicological effects of drugs and other chemicals
![Page 29: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/29.jpg)
Beilstein Database
• Covers organic chemistry back to 1771
• Includes many physical properties
• Includes reaction information
• Structure searchable
• Available on the CrossFire Commander system for academic institutions
• Available on STN and Dialog vendor systems
![Page 30: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/30.jpg)
Gmelin Database
• Covers inorganic and organometallic chemistry back to 1772
• Includes many physical and chemical properties
• Not searchable for reactions
• Accessible through the CrossFire Commander system for academic institutions
![Page 31: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/31.jpg)
Reaction Databases
• CASReact• SPRESI• Organic Syntheses
– Free version: http://chemfinder.cambridgesoft.com/reactions/orgsyn.asp
• ISI’s Index Chemicus• e-EROS (Encyclopedia of Reagents for Organic
Synthesis)• MDL’s Integrated Major Reference Works
– Reactions indexed with InfoChem’s Reaction Classification Code, based on the degree of specificity around the reacting center:
– http://www.infochem.de/eng/index.htm
![Page 32: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/32.jpg)
Cross-Product Approaches
• MDL/InfoChem’s Integrated Major Reference Works– Thieme’s Science of Synthesis (successor to
Houben–Weyl)– Springer’s Comprehensive Asymmetric Synthesis and
their Glycoscience– Elsevier Science’s Comprehensive Organic
Functional Group Transformations– Wiley’s Encyclopedia of Reagents for Organic
Synthesis– Links to primary journal literature.
![Page 33: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/33.jpg)
Physical Property Databases
• Beilstein & Gmelin
• CRC Handbook (CHEMnetBASE)
• Ei ChemVillage
• knovel– Perry’s Chemical Engineers’ Handbook– Lange’s Handbook of Chemistry
• Landolt-Börnstein
![Page 34: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/34.jpg)
Spectral Databases
• Bio-Rad
• Aldrich
• NIST Chemical WebBook
• Some high-quality free databases on the Web, e.g.,
• SDBS, Integrated Spectral Data System for Organic Compounds– http://www.aist.go.jp/RIODB/SDBS/menu-e.html
![Page 35: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/35.jpg)
SDBS IR Spectrum for Traumatic Acid
![Page 36: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/36.jpg)
CCDC
![Page 37: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/37.jpg)
Cambridge Structural Database
• Bibliographic, chemical and crystallographic information for: – organic molecules – metal-organic compounds
• 3D structures have been determined using:– X-ray diffraction – neutron diffraction
• The CSD records results of: • 3D atomic coordinate data for at least all non-H
atoms
![Page 38: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/38.jpg)
CSD components
• ConQuest: search and information retrieval
• Mercury: structure visualization
• Vista: numerical analysis
• PreQuest: database creation
![Page 39: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/39.jpg)
Isatin on the CSD
![Page 40: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/40.jpg)
Other Structural Databases
• Protein Data Bank for polypeptides and polysaccharides having more than 24 units FREE http://www.rcsb.org/pdb/
• Nucleic Acids Data Bank for oligonucleotides FREE http://ndbserver.rutgers.edu/
• Inorganic Crystal Structure Database http://www.fiz-informationsdienste.de/en/DB/icsd/
• CRYSTMET® for metals and alloys http://www.tothcanada.com/
![Page 41: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/41.jpg)
Materials Chemistry Databases
• TDS specializes in chemical engineering data. Includes:– American Institute of Chemical Engineers’
DIPPR Pure Component Data• 29 fixed-value properties and 13 temperature-
dependent properties for about 1600 industrial chemicals
• http://www.tdsonline.com/
![Page 42: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/42.jpg)
Patent Databases
• Derwent World Patents Index
• USPATFULL
• PCTFULL (WIPO/PCT Patents Full Text)
• INPADOC (INternational PAtent DOcumentation Center)
• IFIPAT
• CA and CAplus
![Page 43: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/43.jpg)
Chemical Information System
• 34 environmental databases– Originally developed by the US National Institutes of
Health and the Environmental Protection Agency• Covers over 515,000 compounds
– Toxicological and/or carcinogenic research data – information on handling hazardous materials– chemical/physical property information– Regulations– safety and health effects information– pharmaceutical data
• http://www.nisc.com/cis/qcis1.asp
![Page 44: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/44.jpg)
Hybrid Links to the Web
• STN’s eScience– http://www.escience.org/
• Elsevier Science’s Scirus– http://www.scirus.com/srsapp/
![Page 45: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/45.jpg)
Electronic Journals
• Coverage in some cases back to the 17th century• Most major publishers’ backfiles are now online• DOI: http://www.doi.org/
– Turn a DOI into a URL by appending http://dx.doi.org/ to the front of it
• SFX: http://www.exlibrisgroup.com/sfx.htm• MDL’s Litlink• CrossRef: http://www.crossref.org/
![Page 46: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/46.jpg)
CrossRef Search
• CrossRef Search http://www.crossref.org/crossrefsearch.html
• Pilot initiative running in 2004 in collaboration with Google
• Includes the content of 29 publishers (out of the 650 CrossRef publishers and societies)
• Now covers approximately 3.4 million research articles.
![Page 47: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/47.jpg)
Shift from Ownership to Licensing of Journals
• IUB Chemistry Library e-journals– http://www.indiana.edu/~libchem/402ejrnl.html
• Archival issues– Publisher archives (usually 2-3 locations)– LOCKSS: http://lockss.stanford.edu/– Libraries often have no archival rights
![Page 48: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/48.jpg)
Single Publisher Databases
• Elsevier’s ScienceDirect and their encyclopedia DBs– Scirus: http://www.scirus.com/srsapp/
• Wiley’s journal, book, and encyclopedia DBs: http://www3.interscience.wiley.com/
• American Chemical Society journals– http://pubs.acs.org/
![Page 49: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/49.jpg)
Getting at the Data
• STN’s Information Keep & Share Program– http://info.cas.org/copyright/index.html
• SciFinder Scholar download restrictions: 100 items at a time
![Page 50: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/50.jpg)
Data Analysis Tools
• STN’s Analyze and Tabulate feature
• STN Express with Discover! (Analysis Edition)
• Limited access because of A&I publishers’ reluctance to turn loose of the data
![Page 51: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/51.jpg)
Free Services
• ChemFinder– http://chemfinder.cambridgesoft.com/
• ChemIDplus– http://chemfinder.cambridgesoft.com/
• Frederick/Bethesda Data and Online Services– http://cactus.nci.nih.gov/
• PubMed – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
• DOE’s STI Information Bridge– http://www.osti.gov/bridge/
![Page 52: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/52.jpg)
Budapest Open Access Initiative
• Based on:– Self archiving by authors– Open Access journals, e.g., BioMed Central
• http://www.soros.org/openaccess/
![Page 53: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/53.jpg)
Open Access
• Institute of Physics: most papers free for 30 days after publication– http://www.iop.org/EJ/ and
http://www.iop.org/EJ/journal/NJP
• Public Library of Science– http://www.publiclibraryofscience.org
• Highwire Press– http://www.highwire.org/
• PubMed Central– http://www.pubmedcentral.nih.gov/
![Page 54: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/54.jpg)
Opposition to Open Access
• Reacting to NIH’s proposed policy on open access, C&EN Editor Rudy Baum says:
“[This] action will inflict long-term damage on the communication of scientific results and on maintenance of the archive of scientific knowledge.”
-- C&EN, September 20, 2004, p. 7
![Page 55: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/55.jpg)
Open Access + Semantic Web
• "Almost all of an author's output (compounds, spectra, reactions, properties, etc.) is nowadays computerised and in principle redistributable to the community for re-use. Few journals actively validate the primary data (e.g. spectra) involved in a publication (chemical crystallography being a clear exception where data are intensively reviewed by machine). We reassert that chemists must now move towards publishing their collective knowledge in a systematic and easily accessible form for re-use and innovation....
![Page 56: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/56.jpg)
Open Access + Semantic Web
• We urge that authors, funders, editors, publishers and readers move further towards the following protocol: [1] All information should be ultimately machine-
understandable in XML....[2] Machine-understandable information for a compound
should include a connection table, the IUPAC unique identifier (INChI) which guarantees that the connection table can be checked and regenerated, and a name....
[3] Rights metadata.”-- Murray-Rust, Rzepa, Tyrrella, Zhanga (2004)
![Page 57: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/57.jpg)
Future
• XML and metadata– Dymond (DYnamic Metadata ON Demand)
• Virtual journals (Virtual Journal of Nanoscale Science and Technology)
• Copyright question and open access resolution• Legal protection of databases• Impact of INCHi and CML• Demise of Abstracting and Indexing Services?
![Page 58: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/58.jpg)
Conclusion
• “The main challenge is for chemists to recognise the value of making their data machine-understandable, rather than destroying it with traditional paper or slide-focused publication and dissemination processes.”-- Murray-Rust, Rzepa, Tyrrella, Zhanga (2004)
![Page 59: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/59.jpg)
Parting words . . .
If you're not part of the solution, you're part of the precipitate!
![Page 60: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/60.jpg)
Searches
• Isatin (91-56-5)
• Moronic Acid (RN 6713-27-5)
• Traumatic Acid (RN 6402-36-4)• Others:
http://www.chm.bris.ac.uk/sillymolecules/sillymols.htm
![Page 61: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/61.jpg)
Beilstein Structure Search
R1=O or S R2=H, OH, OMe, CH3, or CO2H
X = any halogen ? = any bond value
![Page 62: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/62.jpg)
Beilstein Property Search
• Find the compounds in the Beilstein CrossFire database that have structure keyword "stereo compound" and molecular formula C29H36O8 and melting points in the range 258-271 Celsius.
![Page 63: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/63.jpg)
Bibliography• Culp, F. Bartow. "Ten or so things that every chemistry librarian
absolutely, positively has to have to keep from being an absolute plonk." Sci-Tech News, February 2004, 58(1), 9. also published as:
SLA Chemistry Division E-Newsletter Winter 2004, 18(3), 19-20). http://www.sla.org/division/dche/Newsletters/Feb_2004.pdf
• Gasaway, Laura. “The open archives movement.” Information Outlook October 2004, 8(10), 36, 39-40.
• Glen, Robert; Aldridge, Susan. “Developing tools and standards in molecular informatics.” Chemical Communications 2002, (23), 2745-2747. DOI: 10.1039/b207793k
http://xlink.rsc.org/?DOI=b207793k
![Page 64: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/64.jpg)
Bibliography• Huber, C.; Porter, K. “Cheap tricks.”http://www.indiana.edu/~cheminfo/workshop/cheap.html
• McLeland, Le-Nhung. What every chemist should know about patents.http://www.chemistry.org/portal/resources/?id=1b41692a6cf811d6f8dd6ed9fe800100
• Murray-Rust, Peter; Rzepa, Henry S.; Tyrrella, Simon M.; Zhanga, Y. “Representation and use of chemistry in the global electronic age.” forthcoming article in: Organic & Biomolecular Chemistry.
http://www.ch.ic.ac.uk/rzepa/obc/ (preprint)
• Wagner, A. Ben. "Finding physical properties of chemicals: A practical guide for scientists, engineers, and librarians.” Science & Technology Libraries 2001, 21(3/4), 27-45. (published Fall 2003)Text for personal and professional use available at:
http://ublib.buffalo.edu/libraries/asl/staff/documents/wagner_phys_prop_stl_art.pdf
![Page 65: Recent Developments in Chemical Database Searching Gary Wiggins E-mail: wiggins@indiana.edu Indiana University School of Informatics ACS Wabash Valley](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649f225503460f94c3b1c1/html5/thumbnails/65.jpg)
Bibliography
• Wiggins, Gary. “Overview of databases/data sources.” in Gasteiger, Johannes, ed. Handbook of Chemoinformatics: From Data to Knowledge in 4 Volumes. Wiley-VCH: 2003, v. 2, pp. 496-506.
http://www.indiana.edu/~cheminfo/C571/wiggins_chapter_2003.pdf
• Wiggins, Gary. “Teaching chemical literature, databases, and chemical informatics.” CPT; Committee on Professional Training [newsletter] Spring 2004, 4(1), 1-2.
http://www.chemistry.org/portal/resources/ACS/ACSContent/education/cpt/nl_cpt_spring2004.pdf