revisiting self-deposit of scientific data darren hardy stanford university open repositories, 10...

Revisiting Self-Deposit of Scientific Data

Darren HardyStanford University

Open Repositories, 10 June 2015, Indianapolis, IN

Why share?

• Sharing scientific data is increasingly valuable – Reproducible, open science– Furthering investigation, innovation– “share [data], and do so in such a way that the

data are interpretable and reusable by others” (Borgman 2012)

Why repositories?

• Repositories in position to facilitate sharing– “The centerpiece of such data sharing [for reuse]

is the digital repository, which acts as the foundation for surrounding value-added services supporting and promoting effective publication, discovery, and dissemination of research data” (Abrams et al. 2013)

But, when researchers self-deposit scholarly scientific data, what are their expectations for

services?

Share Data• Here’s my data…• Email it!

• Preparation… not likely• Citation… “personal communication”• Access… email only• Preservation… nope• Discovery… nope• Rights… nope

✔

Self-Publish Data• Here’s my data…• Personal or project website,

maybe file sharing service like Dropbox

• Preparation maybe• Citation… via URL• Access... as long as website works…• Preservation… nope• Discovery… not assured, maybe Google works• Rights… maybe

😃

Self-Deposit Data• Here’s my data…• Deposited in institutional repository

• Preparation… recommended with suggestions• Citation… persistent• Access… ensured, data & metadata• Preservation… long-term• Discovery… many indexes• Rights… explicit, multiple choices

Example• Marine ecologist Malin Pinsky• Published research on Pacific salmon conservation– Article: Pinsky et al. 2009, Conservation Biology 23(3)– Visible: Used in testimony before the US Senate in 2010

• Self-published GIS data on his personal website• Graduated from Stanford, went to Rutgers• Website taken down(!)… 404 Not Found

• Then, self-deposited into Stanford repository– Now, discovery, access, and preservation services

Scientific data visualized as paper map in Pinsky et al. (2009)

Self-Deposit can provide direct data access Download the actual data!

…with auxiliary downloads

…with citation services

…with discovery services

• Via SearchWorks, our library catalog• Via EarthWorks, our GIS data search engine• Via Google, etc. “pinsky salmon data”– Stanford self-deposit is first hit

(again) …with direct data access

Stanford Digital Repository (sdr.stanford.edu)

• Self-deposit interface to a Hydra repository– 2+ years in production– 300+ depositors– 2,000+ deposits– 20,000+ deposited files– 3+ TB preserved

• Self-training via video, quickstart guide• But, no added services for scientific data

Barriers vs. ExpectationsParticipation no extra work for depositorsMetadata creation no extra work for depositorsData preparation will this be a requirement of

open science?Resource limitations who will write the code?

shepherd deposits?

Are we at an impasse?

• Librarian-mediated approaches are very resource-intensive

• Software and services are often resource-limited

Closing the gap

• Mitigate workflows for librarians, curators• Improve the value proposition for depositors– Data preparation, metadata description, upload,

visualization, annotation, sharing, publication, access, rights, preservation, citation, related work, ontology, discovery, social media

revisiting self-deposit of scientific data darren hardy stanford university open repositories, 10...

Documents