crowdsourcing chemistry for the community – 5 years of experiences antony williams nfais, february...

65
Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Upload: irma-baker

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Crowdsourcing Chemistry for the Community – 5 Years of Experiences

Antony WilliamsNFAIS, February 28th 2012

Page 2: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

The World of Online Chemistry Safety data Toxicity data Blogs and Wikis Property databases Experimental results Scientific publications Compound aggregators Open Notebook Science Metabolic pathway databases Encyclopedic articles (Wikipedia)

Page 3: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

If it was not just about me…

Page 4: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

If it was not just about me…

We might have a community built encyclopedia

I might know where the best restaurants are

I might get good advice on books to read

I might know which movies to watch

I might know which plumber to call

Data might just be Open

Page 5: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

If it was not just about me…

We might have a community built encyclopedia

I might know where the best restaurants are

I might get good advice on books to read

I might know which movies to watch

I might know which plumber to call

Data might just be Open

Page 6: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Collaborative Knowledge Management

Page 7: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

QUESTION

Are you involved with assisting chemists, pharmaceutical scientists, etc. in sourcing information about Chemistry?

1. Yes

2. No

Page 8: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Chemistry Databases on the Internet Public databases are “trusted” as primary sources

Trust is granted without investigation of the content

Online data vary dramatically in quality!

Examples…

Page 9: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

With Great Fanfare…

Page 10: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

NPC Browser http://tripod.nih.gov/npc/

Page 11: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012
Page 12: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

NPC Browser http://tripod.nih.gov/npc/

Page 13: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

How many contribute to clean-up?

Less than a dozen contributors to data

The majority are project members

The crowd is small…

Page 14: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

What you might not know about Chemistry Databases on the Internet Data-sharing between the databases is cyclic –

proliferating errors – “Linked Data”

Page 15: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

What is the Structure of Vitamin K?

Page 16: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

MeSH

A lipid cofactor that is required for normal blood clotting.

Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from

plants, VITAMIN K 2 (menaquinone) from bacteria, and

synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).

Page 17: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

What is the Structure of Vitamin K1?

Page 18: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

QUESTION

Who has heard of ChemSpider as a chemistry database?

1. Yes

2. No

Page 19: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

ChemSpider

Page 20: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

We Want to Answer Questions

Questions a chemist might ask… What is the melting point of n-heptanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?

Page 21: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Available Information…

Linked to vendors, safety data, toxicity, metabolism

Page 22: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Available Information….

Page 23: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Crowdsourced “Annotations”

Users can add Descriptions/Syntheses/Commentaries Links to PubMed articles Links to articles via DOIs Add spectral data Add Crystallographic Information Files Add photos Add MP3 files Add Videos

Page 24: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012
Page 25: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

QUESTION

Did you know that ChemSpider was OWNED by the Royal Society of Chemistry?

1. Yes

2. No

Page 26: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Public Domain Databases

Our databases are a mess…

Non-curated databases are proliferating errors

We source and deposit data between databases

Original sources of errors hard to determine

Curation is time-consuming and challenging

Page 27: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Stop Whining – Fix it

Page 28: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Crowdsourced Curation

Crowdsourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Page 29: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Search “Vitamin H”

Page 30: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

“Curate” Identifiers

Page 31: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

“Curate” Identifiers

Page 32: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Validated Name-Structure Dictionaries

Chemical name dictionaries are used for: Text-mining (publications, patents)

Used to index PubMed and link to Google Patents

Linking to other databases – think Biology! When structures are not available drug names link

Searching the web Names link to structures link to InChIs

Page 33: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Why are Dictionaries important?

Page 34: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

The Final Search Strategy

Page 35: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Many Names, One Structure

Page 36: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

I want to know about “Vincristine”

Page 37: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Vincristine: Identifiers and Properties

Page 38: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Vincristine: PatentsLinked by Name

Page 39: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Text-Mining Depends on Dictionaries

Page 40: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Curated Dictionaries Matter

Page 41: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Originally 15 compounds “called” Yohimbine54 Skeletons for Yohimbine

Page 42: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Sharing Chemspider curation

Page 43: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Data Curation Sharing - Proof of Concept

Page 44: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Identifier Dictionaries

Reciprocal curation processes…share curation

A series of “added” and “removed” synonyms against structures for matching.

Announced 9 months ago – only one consumer

Who will participate???

Page 45: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Community Contribution to ChemSpider

Page 46: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

www.SpectralGame.comhttp://www.jcheminf.com/content/1/1/9

Page 47: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Curation through “gaming”

Page 48: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Data Curation

Page 49: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Reversed Spectrum

Page 50: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

True Curation of Data

Page 51: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

ChemSpider SyntheticPages

Page 52: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

ChemSpider SyntheticPages

Page 53: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Submission Process Simple template-based submission process

Submissions reviewed by editorial board.

Online Peer Review process

Crowdsourced expansion? A few regular dedicated authors only Online peer review and feedback small but useful

Page 54: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Crowdsourcing – does it work?

192 people EVER have deposited or curated data

ChemSpider SyntheticPages small group of authors

Database hosts make the largest contributions

ChemSpider staff tend to do the most curation

Page 55: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Contributions

Page 56: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Curations

2009 – 8255 curations by 43 people

2010 – 10014 curations by 66 people

2011 – 16025 curations by 116 people

“Crowdsourcing” – the crowd is small!

Page 57: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

www.SciMobileApps.com

8 contributors only…in 7 months

Page 58: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

www.SciDBs.com

7 contributors only…in 6 months

Page 59: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

www.ScientistsDB.com

38 contributors …in 6 weeks

Page 60: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

What encourages participation?

“Interested” parties contribute

Marketing and self-promotion are primary reasons for participation

There are very few “selfless” participants

Relationships garner contributions…

Page 61: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Crowdsourcing across drug discovery

Open PHACTS : partnership between European Community and European Pharma Companies

Freely accessible for knowledge discovery and verification. Data on chemistry and biology Pharmacological profiles Proprietary and public data sources.

Page 62: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012
Page 63: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

How will it improve?

Participation and

contribution

Page 64: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Conclusions For chemistry - crowdsourced deposition, annotation,

and curation works but low engagement to date

Primary challenge – engaging the community to help create what they want. Rewards and recognition?

MORE collaboration can benefit us all

Indicators are good for small but continued growth

Page 65: Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012

Thank you

Email: [email protected] Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams