data sharing, small science, and institutional repositories melissa h. cragin & carole l. palmer...

12
Data Sharing, Small Science, and Institutional Repositories Melissa H. Cragin & Carole L. Palmer Center For Informatics Research in Science and Scholarship Grad. School of Library and Information Science, University of Illinois Jacob R. Carlson & Michael Witt Purdue University Libraries

Upload: denis-weaver

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Data Sharing, Small Science, and Institutional Repositories

Melissa H. Cragin & Carole L. PalmerCenter For Informatics Research in Science and Scholarship

Grad. School of Library and Information Science, University of Illinois

Jacob R. Carlson & Michael WittPurdue University Libraries

A view from the Institutional Repository

Advancing university-based cyberinfrastructure is dependent on our understanding of how to support data practices and needs.

Sharing is at the heart of success: collecting, storing, and making use of data can only come after the means for sharing are in place.

We cannot collect and curate all data, particularly in a way that facilitates effective re-use. We will need to work with researchers to develop

selection and appraisal guidelines, and data services.

Data Curation Profiles Project

Project focus: which data are researchers willing to share, when, and with whom?

Objectives: derive requirements for managing data sets in IRs develop policies for archiving and access identify librarian roles & skill sets for supporting data

management, sharing & curation.

BiochemistryBiology

Civil EngineeringElectrical Engineering

Food SciencesEarth and Atmospheric Sciences

Soil Science

AnthropologyGeology

Plant SciencesKinesiology

Speech and Hearing Earth and Atmospheric Sciences

Soil Science

Methods

Institutional Review Board for approval of Human Subjects Research

increasingly focused, materials-based interviews Pre-interview Worksheet Requirements Worksheet

“data set” samples

Data Curation Profileshttp://www.datacurationprofiles.org/

Faculty Population for Initial Needs Assessment by Department

43

37

24

17

161413

12

10

10

8

7

7

7

7

7

66

55 5 5 4

Illinois State Surveys

No. Dept/s with <4 faculty

Natural Res & Env Sci

Civil & Environmental Eng

VeterinarySciences

Crop Sciences

Plant Biology

Architecture and Landscape Architecture

Agricultural Engineering

Geography

Geology

Agr & Cons Econ

Animal Sciences

Atmospheric Sciences

Food Science & Human Nutrition

Mechanical & Industrial Eng

Animal Biology

Waste Management Research Ctr

Anthropology

Electrical & Computer Eng

Materials Science & Engineering

Urban & Reg Planning

Chemistry

“Faculty of the Environment” Data Needs ProjectCollaborators: Bryan Heidorn, Michelle Wander, U of I Environmental Council

Smallish Science

single PI (often) often dependent on graduate students ad hoc data management systems idiosyncratic sharing practices “success” dependent on using one’s own data

But… may be working at community level may be producing all digital data may be conducting “data-driven” science may be producing very large data sets

Data Characteristics

Crystallography Geology

Type 1. “Raw data” Most information rich, long-term value for re-use

…4. “CIF file” – crystallography exchange

Most commonly shared data type

1. “Reduced spreadsheet” – table withaverage values for multiple observations

Most often requested by others

Format 1. Binary data – image4. Crystallographic Information File – text (field-wide standard for numerical data)

1. Excel spreadsheet

Size 1. Each image or “frame” ¼ to 1 Mb Set is approx. 2,400 frames = approx 1Gb4. > 500Kb

1. spreadsheet size – under 1Mb

Intellectual Property/Data Owners

Service model provide a service to chemists by solving crystal structures

Ownership of the data is ambiguous, and require negotiation before data “hand-off”

Depends on source of funding governmental and private grants, gov. institutions, industry

Ownership of and right to the data range from full

to very limited, some long-term “embargoes”

Accessibility Field-wide repositories Many journals require deposit of CIF files OAI-PMH tools becoming available for CIF files

Difficult and ad hoc Well-known researchers receive direct requests for data, often based on publications

Profiling complexities & differences

Findings

Distinguishing exchange from open sharing exchange: sharing amongst collaborators is a primary

concern, often with significant barriers (more) open access: limited by need for control and

reward system, but also

Sharing with wider “publics” is conditioned by both data management pressures and personal experience the “known person – cost” algorithm incidents of misuse

What is most easily or willingly shared is not always the data that has the most re-use value

Field

Specific Research

Area Form to be shared FormatsType of data set Size

Shared when?

Atmospheric science

severe weather modeling

compressed output of the model Vis5D

1 file / dataset 10-100 Mb

4-6 month embargo,

Agronomy

water quality, drainage, and plant growth

cleaned and reviewed sensor and hand-collected sample data .xls

approx. 100 files

~1MB each, up to 20 Mb

After publication

Geologyrock, water and microbes

averaged sensor and hand-collected sample data; photographs .xls; jpg

1 file; images < 1 Mb

After publication

Civil Engineering

traffic movement

cleaned and normalized sensor data

MySQL (postgresql)

1 database

approx. 1000 K/day

1 month to 1 year embargo

Examples of what, and when

Implications for Institutional Repositories

embargo services are a *must* (~66%, 14/20)

clear, explicit data citation information in IR records

disconnect: application of metadata standards highly important, but many unaware of existing standards

preservation services are needed to support re-use: 11/19 participants said their data would be useful for more than 10 years.

Supporting the science process

data exchange infrastructure

support for data management planning

data literacy instruction - integral to scientific information work

Broader implications for academic institutions

Leadership Opportunities for Libraries

Thank you

This research is supported by the Institute of Museum and Library Services, (IMLS) grant # LG-06-070032-07.

D. Scott Brandt, PI

Co-PIs: M. Witt & J. Carlson, (Purdue) and C. Palmer & S. Shreeves (UIUC)

RAs: D. Leiter (Purdue) and M. Kogan (UIUC)