revisiting self-deposit of scientific data darren hardy stanford university open repositories, 10...

18
Revisiting Self- Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Upload: agnes-martin

Post on 02-Jan-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Revisiting Self-Deposit of Scientific Data

Darren HardyStanford University

Open Repositories, 10 June 2015, Indianapolis, IN

Page 2: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Why share?

• Sharing scientific data is increasingly valuable – Reproducible, open science– Furthering investigation, innovation– “share [data], and do so in such a way that the

data are interpretable and reusable by others” (Borgman 2012)

Page 3: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Why repositories?

• Repositories in position to facilitate sharing– “The centerpiece of such data sharing [for reuse]

is the digital repository, which acts as the foundation for surrounding value-added services supporting and promoting effective publication, discovery, and dissemination of research data” (Abrams et al. 2013)

Page 4: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

But, when researchers self-deposit scholarly scientific data, what are their expectations for

services?

Page 5: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Share Data• Here’s my data…• Email it!

• Preparation… not likely• Citation… “personal communication”• Access… email only• Preservation… nope• Discovery… nope• Rights… nope

Page 6: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Self-Publish Data• Here’s my data…• Personal or project website,

maybe file sharing service like Dropbox

• Preparation maybe• Citation… via URL• Access... as long as website works…• Preservation… nope• Discovery… not assured, maybe Google works• Rights… maybe

😃

Page 7: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Self-Deposit Data• Here’s my data…• Deposited in institutional repository

• Preparation… recommended with suggestions• Citation… persistent• Access… ensured, data & metadata• Preservation… long-term• Discovery… many indexes• Rights… explicit, multiple choices

Page 8: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Example• Marine ecologist Malin Pinsky• Published research on Pacific salmon conservation– Article: Pinsky et al. 2009, Conservation Biology 23(3)– Visible: Used in testimony before the US Senate in 2010

• Self-published GIS data on his personal website• Graduated from Stanford, went to Rutgers• Website taken down(!)… 404 Not Found

• Then, self-deposited into Stanford repository– Now, discovery, access, and preservation services

Page 9: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Scientific data visualized as paper map in Pinsky et al. (2009)

Page 10: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Self-Deposit can provide direct data access Download the actual data!

Page 11: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

…with auxiliary downloads

Page 12: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

…with citation services

Page 13: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

…with discovery services

• Via SearchWorks, our library catalog• Via EarthWorks, our GIS data search engine• Via Google, etc. “pinsky salmon data”– Stanford self-deposit is first hit

Page 14: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

(again) …with direct data access

Page 15: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Stanford Digital Repository (sdr.stanford.edu)

• Self-deposit interface to a Hydra repository– 2+ years in production– 300+ depositors– 2,000+ deposits– 20,000+ deposited files– 3+ TB preserved

• Self-training via video, quickstart guide• But, no added services for scientific data

Page 16: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Barriers vs. ExpectationsParticipation no extra work for depositorsMetadata creation no extra work for depositorsData preparation will this be a requirement of

open science?Resource limitations who will write the code?

shepherd deposits?

Page 17: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Are we at an impasse?

• Librarian-mediated approaches are very resource-intensive

• Software and services are often resource-limited

Page 18: Revisiting Self-Deposit of Scientific Data Darren Hardy Stanford University Open Repositories, 10 June 2015, Indianapolis, IN

Closing the gap

• Mitigate workflows for librarians, curators• Improve the value proposition for depositors– Data preparation, metadata description, upload,

visualization, annotation, sharing, publication, access, rights, preservation, citation, related work, ontology, discovery, social media