crowdsourcing chemistry for the community – 5 years of experiences antony williams nfais, february...

Post on 25-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Crowdsourcing Chemistry for the Community – 5 Years of Experiences

Antony WilliamsNFAIS, February 28th 2012

The World of Online Chemistry Safety data Toxicity data Blogs and Wikis Property databases Experimental results Scientific publications Compound aggregators Open Notebook Science Metabolic pathway databases Encyclopedic articles (Wikipedia)

If it was not just about me…

If it was not just about me…

We might have a community built encyclopedia

I might know where the best restaurants are

I might get good advice on books to read

I might know which movies to watch

I might know which plumber to call

Data might just be Open

If it was not just about me…

We might have a community built encyclopedia

I might know where the best restaurants are

I might get good advice on books to read

I might know which movies to watch

I might know which plumber to call

Data might just be Open

Collaborative Knowledge Management

QUESTION

Are you involved with assisting chemists, pharmaceutical scientists, etc. in sourcing information about Chemistry?

1. Yes

2. No

Chemistry Databases on the Internet Public databases are “trusted” as primary sources

Trust is granted without investigation of the content

Online data vary dramatically in quality!

Examples…

With Great Fanfare…

NPC Browser http://tripod.nih.gov/npc/

NPC Browser http://tripod.nih.gov/npc/

How many contribute to clean-up?

Less than a dozen contributors to data

The majority are project members

The crowd is small…

What you might not know about Chemistry Databases on the Internet Data-sharing between the databases is cyclic –

proliferating errors – “Linked Data”

What is the Structure of Vitamin K?

MeSH

A lipid cofactor that is required for normal blood clotting.

Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from

plants, VITAMIN K 2 (menaquinone) from bacteria, and

synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).

What is the Structure of Vitamin K1?

QUESTION

Who has heard of ChemSpider as a chemistry database?

1. Yes

2. No

ChemSpider

We Want to Answer Questions

Questions a chemist might ask… What is the melting point of n-heptanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?

Available Information…

Linked to vendors, safety data, toxicity, metabolism

Available Information….

Crowdsourced “Annotations”

Users can add Descriptions/Syntheses/Commentaries Links to PubMed articles Links to articles via DOIs Add spectral data Add Crystallographic Information Files Add photos Add MP3 files Add Videos

QUESTION

Did you know that ChemSpider was OWNED by the Royal Society of Chemistry?

1. Yes

2. No

Public Domain Databases

Our databases are a mess…

Non-curated databases are proliferating errors

We source and deposit data between databases

Original sources of errors hard to determine

Curation is time-consuming and challenging

Stop Whining – Fix it

Crowdsourced Curation

Crowdsourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Search “Vitamin H”

“Curate” Identifiers

“Curate” Identifiers

Validated Name-Structure Dictionaries

Chemical name dictionaries are used for: Text-mining (publications, patents)

Used to index PubMed and link to Google Patents

Linking to other databases – think Biology! When structures are not available drug names link

Searching the web Names link to structures link to InChIs

Why are Dictionaries important?

The Final Search Strategy

Many Names, One Structure

I want to know about “Vincristine”

Vincristine: Identifiers and Properties

Vincristine: PatentsLinked by Name

Text-Mining Depends on Dictionaries

Curated Dictionaries Matter

Originally 15 compounds “called” Yohimbine54 Skeletons for Yohimbine

Sharing Chemspider curation

Data Curation Sharing - Proof of Concept

Identifier Dictionaries

Reciprocal curation processes…share curation

A series of “added” and “removed” synonyms against structures for matching.

Announced 9 months ago – only one consumer

Who will participate???

Community Contribution to ChemSpider

www.SpectralGame.comhttp://www.jcheminf.com/content/1/1/9

Curation through “gaming”

Data Curation

Reversed Spectrum

True Curation of Data

ChemSpider SyntheticPages

ChemSpider SyntheticPages

Submission Process Simple template-based submission process

Submissions reviewed by editorial board.

Online Peer Review process

Crowdsourced expansion? A few regular dedicated authors only Online peer review and feedback small but useful

Crowdsourcing – does it work?

192 people EVER have deposited or curated data

ChemSpider SyntheticPages small group of authors

Database hosts make the largest contributions

ChemSpider staff tend to do the most curation

Contributions

Curations

2009 – 8255 curations by 43 people

2010 – 10014 curations by 66 people

2011 – 16025 curations by 116 people

“Crowdsourcing” – the crowd is small!

www.SciMobileApps.com

8 contributors only…in 7 months

www.SciDBs.com

7 contributors only…in 6 months

www.ScientistsDB.com

38 contributors …in 6 weeks

What encourages participation?

“Interested” parties contribute

Marketing and self-promotion are primary reasons for participation

There are very few “selfless” participants

Relationships garner contributions…

Crowdsourcing across drug discovery

Open PHACTS : partnership between European Community and European Pharma Companies

Freely accessible for knowledge discovery and verification. Data on chemistry and biology Pharmacological profiles Proprietary and public data sources.

How will it improve?

Participation and

contribution

Conclusions For chemistry - crowdsourced deposition, annotation,

and curation works but low engagement to date

Primary challenge – engaging the community to help create what they want. Rewards and recognition?

MORE collaboration can benefit us all

Indicators are good for small but continued growth

Thank you

Email: williamsa@rsc.org Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

top related