data citations: who cares?
DESCRIPTION
Who cares how research data is attributed and cited? Lots of people. Presented by Heather Piwowar to DataONE summer internship 2010 group on data citatioTRANSCRIPT
Data citation...Who cares?
Heather Piwowar
DataONE postdoc with Dryad and NESCentDataONE summer internship meeting
July 7, 2010
http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
http://www.flickr.com/photos/jsmjr/62443357/
http://www.flickr.com/photos/camilleharrington/3587294608/
http://www.flickr.com/photos/rkuhnau/3318245976/
http://www.flickr.com/photos/conformpdx/1796399674/
http://www.flickr.com/photos/rkuhnau/3317418699/
http://www.flickr.com/photos/zemlinki/261617721/
http://www.flickr.com/photos/tracenmatt/3020786491/
http://www.flickr.com/photos/the-o/2078239333/
Probably.
In theory.
?
• Genbank
• PDB
http://www.oxfordjournals.org/nar/database/cap/
http://www.flickr.com/photos/archeon/2941655917/
Data citation...
datasetpaper
paper
paper
paper
paper
paper
dataset
dataset
dataset
dataset
dataset
• Alas, no unique standard identifier• URL• accession number• DOI• citation to paper• citation to database• reference to supplementary material• search strategy
Example: full-text phrases containing “... accessed”
“submitted”
“downloaded”
• Citations are indexed and machine-extractable
datasetpaper
paper
paper
paper
paper
paper
dataset
dataset
dataset
dataset
dataset
• understand current practice• articulate the best best-practices
datasetpaper
paper
paper
paper
paper
paper
dataset
dataset
dataset
dataset
dataset
Who cares?
1. Data creators
• personal reward• motivation:
• “if it really helped”• even esoteric datasets are useful
• how prevalent is scooping?• alert to possible misuses• grounded requirements
2. Data reusers
• clear guidelines are helpful• what has been reused, for what?• what hasnʼt?
3. Repository creators, maintainers
• funding• how much metadata• how to format• what additional tools are useful• lifecycle of data
4. Funders
• most, best science for their money• cost/benefit of mandate• inform funding decisions:
• what has been extra useful?• what hasnʼt?
• what support is needed
5. Journals
• increasingly called upon to mandate or fund:
• how to decide• how to rationalize
• another avenue to compete
6. Information scientists
• extension of citation analysis for studying information behaviour
6. Me
Articles published in journals
with a strong data-sharing
policy are more likely to have
publicly available datasets
Reuse estimate
• 2703 submissions in 2007 • GSE* in PubMed Central• Exclude author overlap• Exclude data creation
• automatically, manually
• 139
• 520
7. You
8. Your mom
9. These mice
http://www.flickr.com/photos/ryanr/142455033/
10. Scientific progress
• trace errors, fraud• increase transparency• more efficient and effective
you can not manage what you do not measure
quote: Lord Kelvinhttp://www.flickr.com/photos/archeon/2941655917/
science about our science
http://www.flickr.com/photos/druclimb/293046352/
questions?
Thanks to:
NSF, DataONE, NESCent, Dryad
UBC Dept of Zoology
NLM, U of Pittsburgh Dept of Biomedical Informatics
Open science online community and those who release their articles, datasets and photos openly