revisiting self-deposit of scientific data darren hardy stanford university open repositories, 10...
TRANSCRIPT
Revisiting Self-Deposit of Scientific Data
Darren HardyStanford University
Open Repositories, 10 June 2015, Indianapolis, IN
Why share?
• Sharing scientific data is increasingly valuable – Reproducible, open science– Furthering investigation, innovation– “share [data], and do so in such a way that the
data are interpretable and reusable by others” (Borgman 2012)
Why repositories?
• Repositories in position to facilitate sharing– “The centerpiece of such data sharing [for reuse]
is the digital repository, which acts as the foundation for surrounding value-added services supporting and promoting effective publication, discovery, and dissemination of research data” (Abrams et al. 2013)
But, when researchers self-deposit scholarly scientific data, what are their expectations for
services?
Share Data• Here’s my data…• Email it!
• Preparation… not likely• Citation… “personal communication”• Access… email only• Preservation… nope• Discovery… nope• Rights… nope
✔
Self-Publish Data• Here’s my data…• Personal or project website,
maybe file sharing service like Dropbox
• Preparation maybe• Citation… via URL• Access... as long as website works…• Preservation… nope• Discovery… not assured, maybe Google works• Rights… maybe
😃
Self-Deposit Data• Here’s my data…• Deposited in institutional repository
• Preparation… recommended with suggestions• Citation… persistent• Access… ensured, data & metadata• Preservation… long-term• Discovery… many indexes• Rights… explicit, multiple choices
Example• Marine ecologist Malin Pinsky• Published research on Pacific salmon conservation– Article: Pinsky et al. 2009, Conservation Biology 23(3)– Visible: Used in testimony before the US Senate in 2010
• Self-published GIS data on his personal website• Graduated from Stanford, went to Rutgers• Website taken down(!)… 404 Not Found
• Then, self-deposited into Stanford repository– Now, discovery, access, and preservation services
Scientific data visualized as paper map in Pinsky et al. (2009)
Self-Deposit can provide direct data access Download the actual data!
…with auxiliary downloads
…with citation services
…with discovery services
• Via SearchWorks, our library catalog• Via EarthWorks, our GIS data search engine• Via Google, etc. “pinsky salmon data”– Stanford self-deposit is first hit
(again) …with direct data access
Stanford Digital Repository (sdr.stanford.edu)
• Self-deposit interface to a Hydra repository– 2+ years in production– 300+ depositors– 2,000+ deposits– 20,000+ deposited files– 3+ TB preserved
• Self-training via video, quickstart guide• But, no added services for scientific data
Barriers vs. ExpectationsParticipation no extra work for depositorsMetadata creation no extra work for depositorsData preparation will this be a requirement of
open science?Resource limitations who will write the code?
shepherd deposits?
Are we at an impasse?
• Librarian-mediated approaches are very resource-intensive
• Software and services are often resource-limited
Closing the gap
• Mitigate workflows for librarians, curators• Improve the value proposition for depositors– Data preparation, metadata description, upload,
visualization, annotation, sharing, publication, access, rights, preservation, citation, related work, ontology, discovery, social media