chemspider as a platform for crowd participation in curating chemistry

63
ChemSpider as a Platform for Crowd Participation in Curating Chemistry Antony Williams IDCC, Chicago, December 2010

Upload: orcid-0000-0002-2668-4821

Post on 10-May-2015

1.273 views

Category:

Technology


1 download

DESCRIPTION

This is a presentation I gave at the International Digital Curation Conference in Chicago, December 7th 2010, #idcc10. The presentation discusses the issues of data quality and the need for collective, crowdsourced efforts to improve the quality of chemistry related data on the Internet

TRANSCRIPT

Page 1: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Antony WilliamsIDCC, Chicago, December 2010

Page 2: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

WARNING: Chemistry is Dangerous

Page 3: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Di-Hydrogen Monoxide

Page 4: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Di-Hydrogen Monoxide

2H

Page 5: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Di-Hydrogen Monoxide

2H + 1O

Page 6: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Di-Hydrogen Monoxide

H2O

Page 7: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Di-Hydrogen Monoxide

H2OWater

Page 8: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

It’s all on Wikipedia…

Page 9: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Chemistry on the Internet – Not All Bad

100s of websites hosting chemistry-related data Chemistry information is generally “compound-based”

Chemical “structures” Identifiers, names and synonyms Properties Analytical data How to synthesize Articles, patents, safety information

Chemistry “language and dialects”

Page 10: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Dialects describing chemicals

Page 11: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

A Pragmatic Vision

“Build a Structure Centric Community”

Integrate chemistry across the internet based on “chemical structure”

A “structure-based hub” to information and data Let chemists contribute their own data Allow the community to curate & annotate data

Page 12: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

www.chemspider.com

Page 13: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Answering Questions for Chemists Questions a chemist might ask…

What is the melting point of n-heptanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Aspirin? What is the NMR spectrum of Benzoic Acid? What are the safety handling issues for toluene?

Page 14: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Search for a Chemical…by name

Page 15: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Available Information… Linked to chemical vendors, safety data, toxicity,

metabolism…

Page 16: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Available Information….

Page 17: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

ChemSpider Today

Almost 25 million unique chemicals Over 400 data sources Grows daily – community and RSC depositions Community annotation and curation

We curate, edit, change, enhance data daily

Page 18: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Three Years of Experience Internet-based chemistry is a mess!

Public compound databases are contaminated

The annotation/curation of data online is difficult

Most database hosts are non-responsive to feedback – “We are a host/repository of data”

Who cares?

Page 19: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Linked Data on the Web

Page 20: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science

Page 21: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

What is the Structure of Vitamin K?

Page 22: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

MeSH – Medical Subject Headings

Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants, VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).

Page 23: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

What is the Structure of Vitamin K1?

Page 24: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

What is the Structure of Vitamin K1?

Page 25: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Chemical Abstracts“Common Chemistry” Database

Page 26: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Wikipedia

Page 27: ChemSpider as a Platform for Crowd Participation in Curating Chemistry
Page 28: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Incorrect Structures

Page 29: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Lack of Stereochemistry

Page 30: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Does stereochemistry matter?

Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide

Page 31: ChemSpider as a Platform for Crowd Participation in Curating Chemistry
Page 32: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

PubChem

Page 33: ChemSpider as a Platform for Crowd Participation in Curating Chemistry
Page 34: ChemSpider as a Platform for Crowd Participation in Curating Chemistry
Page 35: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

What’s Methane?

Page 36: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

What’s Methane?

Page 37: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

What ELSE is Methane???

Page 38: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Internet-Based Chemistry is a Mess

Algorithms can get you so far

Human curation is necessary

Only the crowds can help with big data… ChemSpider is approaching 25 million compounds

Page 39: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Search “Vitamin H”

Page 40: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Search “Vitamin H”

Page 41: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

“Curate” Identifiers

Page 42: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

“Curate” Identifiers

Page 43: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

“Curate” Identifiers

Page 44: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Crowd-sourcing Chemistry Curation

Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Page 45: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

“Curate” Identifiers

General curation activities Remove incorrect names Correct spellings Add multilingual names Add alternative names

In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually

130 people have participated in validation or annotation. “Crowds” can be quite small!

Page 46: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Crowdsourcing Works

The “crowd” has deposited data (structures, spectra, etc) and participated in data curation

Different level curators check each others work Wikipedia is the modern primary example Some curators are “madmen”…

Page 47: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Crowdsourcing Works

The “crowd” has deposited data (structures, spectra, etc) and participated in data curation

Different level curators check each others work Wikipedia is the modern primary example Some curators are “madmen”… The Oxford English Dictionary

Page 48: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Vancomycin – Curate This!!!

Page 49: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Vancomycin on ChemSpider 1 compound – 3 days

Page 50: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Crowdsourced “Annotations”

Users can add Descriptions/Syntheses/Commentaries Links to articles Spectral data Photos MP3 files Videos

Page 51: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Multimedia Content Holder

Page 52: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Gaming for Curation of Spectra

Page 53: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

ChemSpider EverywhereCrowdsourced Curation of Spectra

Page 54: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Data Curation

Page 55: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

True Curation of Data

Page 56: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

ChemSpider SyntheticPages

Page 57: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Drug Name Generic Name ChEBI ChemSpiderCAS Com.

Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia

SpirivaTiotropium Bromide

No Hits No Hits 4/0

DepakoteValproate semisodium No

Structure

Basen Voglibose No Hits No Hits 2/1 Symbicort 1) Budesonide 8/1 Symbicort 2) Formoterol WRONG No Hits 6/1 Vytorin 1) Ezetimibe No Hits Vytorin 2) Simvastatin 2/1 Taxol Paclitaxel 44/1 Thalidomid Thalidomide No Hits Zocor Simvastatin 2/1 Crestor Rosuvastatin No Hits 2/1

Page 58: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Sharing Our Activities

Presently defining approaches with other public compound databases to share results of curation activities

Member of large European project to link data from the Life Sciences. Sharing results of curation is essential

Making curation and contribution interfaces Mobile

Page 59: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Mobile ChemSpider

Page 60: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

First request to Database Hosts!

Every public compound database host should add ONE feature – “Leave Comments”

Page 61: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Second request to Database Hosts! Show Comments

Page 62: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Question Quality

Page 63: ChemSpider as a Platform for Crowd Participation in Curating Chemistry

Thank you

Email: [email protected] Twitter: ChemConnectorBlog: www.chemspider.com/blogPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams