dmptool webinar 8: data curation profiles and the dmptool (presented by jake carlson)

40
Logistics for Webinar You must call in for audio: 866-740-1260 access code 9870179# Participants muted Ask questions in chat any time 20 minutes for Q&A Recording & slides, schedule of webinars: blog.dmptool.org/webinar-series DMPTool Webinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS 13 August 2013

Upload: university-of-california-curation-center

Post on 06-May-2015

571 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Logistics for Webinar

You must call in for audio: 866-740-1260 access code 9870179#

Participants mutedAsk questions in chat any time

20 minutes for Q&A

Recording & slides, schedule of webinars: blog.dmptool.org/webinar-series

DMPTool Webinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS13 August 2013

Page 2: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

28 May Introduction to the DMPTool

4 June Learning about data management: Resources, tools, materials

18 June Customizing the DMPTool for your institution

25 June Environmental Scan: Who's important at your campus

9 July Promoting institutional services; EZID Outreach Made Simple!

16 July Health Sciences & DMPTool - Lisa Federer, UCLA

30 July Digital humanities and the DMPTool - Miriam Posner, UCLA

13 Aug Data curation profiles and the DMPTool – Jake Carlson, Purdue

27 Aug Talking points for meeting with institutional stakeholders

10 Sep Tools and resources that work with/complement the DMPTool

Beyond funder requirements: more extensive DMPs

Case studies 1 – How librarians have successfully used the tool

Case studies 2 – How librarians have successfully used the tool

Outreach Kit Introduction

Certification program introduction

blog.dmptool.org/webinar-series

Page 3: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Data Curation Profiles & the DMPTool

Jake Carlson Associate Professor of Library Science / Data

Services Specialist

Purdue University Libraries

DMPTool Webinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS13 August 2013

Page 4: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Road Map

• History / Background of the DCP Toolkit

• Comparing the DMP and the DCP

• Case Study in using the DCP

Page 5: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

“Investigating Data Curation Profiles across Research Domains”

• Awarded in 2007 to Purdue Libraries and Graduate School of Library and Information Science at UIUC

• Goals of the project: – To understand the practices, attitudes and needs of

researchers in managing and sharing their data. – To Identify possible roles for librarians to facilitate data

sharing and curation.– To develop a tool for librarians to gather information on

researcher needs for their data.

Page 6: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Interview areas: 20 faculty, 12 disciplines

Agronomy & Soil Science (Purdue & UIUC),

Anthropology (UIUC), Biochemistry (Purdue),

Biology (Purdue), Civil Engineering (Purdue),

Earth & Atmospheric Sciences (Purdue & UIUC),

Electrical & Computer Engineering (Purdue),

Food Science (Purdue), Geology (UIUC),

Horticulture & Plant Science (Purdue & UIUC),

Kinesiology (UIUC), Speech and Hearing (UIUC)

Page 7: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

What we asked …

• Research Data Lifecycle (story of the data)

• Characteristics of the Data• Data Management / Storage• Data Dissemination and Sharing• Data Preservation and

Repositories• Roles for Libraries and Librarians

Page 8: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

The ability to cite this dataset in my publications

The ability for researchers within my discipline to easily find this dataset

The ability for researchers outside of my discipline to easily find this dataset

The ability for people to easily discover this dataset using Google

Prioritize your needs for the following types of services

Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA.

n=19

Page 9: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Prioritize your needs for the following types of services

The ability for me to submit this dataset to a repository myself

The process of submitting this dataset to a repository is automated

The ability to make these data accessible in multiple formats

The ability of the repository to provide version control for the data

Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA.

n=19

Page 10: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

An interview based tool for gathering:

• Information about a particular data set.

• What a researcher is doing to manage / curate the data set.

• What a researcher would like to do with the data.

http://datacurationprofiles.org

Page 11: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

DCP Sections

• Information about the Data and its Context–Overview of the Research

• Focus• Intended Audience• Funding

–Data Kinds and Stages• Data Narrative (data lifecycle)• Target Data for Sharing• Use/re-use Value• Contextual Narrative

Page 12: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Data Stage Output Typical File Size Format Other / NotesPrimary Data

RawSensor data

100k in 1 file per day

proprietary to the sensor

FTP downloads are mostly automated.

Processing Stage 1

Sensor data –open/accessible format Roughly 6kb .csv / .xls

Data are formatted into .csv before bring reformatted into a mySQL database.

ProcessedData vectors

800 records per intersection per day. SQL / .xls

Data are extracted from the mySQL database for analysis purposes.

Analyzedcharts/Graphs .xls / .emf

charts and graphs used for interpretation.

Publishedcharts/graphs .ppt

Data are presented via power point.

Ancillary Data

ImageStills taken from video

.gif /.jpg / .ppt

Images generated from video.

Page 13: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

More DCP Sections

Information about Needs–Intellectual Property

–Organization and description of data

–Ingest–Access–Discovery

–Tools–Interoperability

–Measuring Impact

–Data Management

–Preservation

Page 14: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)
Page 15: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Context

• Focused on a specific context: developing a data management plan for submission to a funding agency.

• Focused on a broad context: understanding the researcher’s data and needs well enough to respond.

Page 16: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Timing

• For use in the “Planning Stages” of the Data Lifecycle

• For use in the “Active Data Stages” of the Data Lifecycle

Page 17: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

“The Research Lifecycle” model developed by the University of Virginia Library’s Scientific Data Consulting Group.

Page 18: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Structure

• The DMP Tool’s structure is based on the specific elements of the agency’s data management plan.

• The DCP Toolkit is modular in nature. Questions and sections can be changed.

Page 19: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Level of Investment

• Generating a DMP using the DMP Tool is a short term investment.

• Generating a DCP is a longer term investment, but with a potentially large payoff.

Page 20: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Sharable Output

• Data management plans are intended to be submitted to a funding agency, not to be shared publicly.

• Data curation profiles are intended to be shared with others.

Page 21: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

http://docs.lib.purdue.edu/dcp

Page 22: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)
Page 23: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

• Both tools seek to help researchers identify and address needs in managing and curating data.

• In particular, both tools aim to foster the creation of data that are discoverable, accessible, well-described and usable by others.

Page 24: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

“The Research Lifecycle” model developed by the University of Virginia Library’s Scientific Data Consulting Group.

Page 25: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

• Both tools can be used to help librarians connect with researchers about their data.

• Both organizations recognize and support the roles of librarians in providing services to support the data lifecycle.

Page 26: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Case Study:Water Quality Field Station

with Marianne Bracke

Agricultural Sciences Information Specialist

Associate Professor of Library SciencePurdue University Libraries

Page 27: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

The Water Quality Field Station

On a 991 acre farm facility northwest of Purdue opened in 1992.

Used to identify agricultural practices that minimize movement of AG chemicals into water supplies.

Informs the development of new and more ecologically-balanced technologies for crop production.

Page 28: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Graduate Students

Graduate students are on the front lines of data. Sharing data locally, between graduate students,

was challenging to do.

Page 29: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Project Steps

Utilize Data Curation Profiles to collect information about current data gathering, workflow and documentation.

Identify common issues and needs as observed in the Data Curation Profiles.

Produce a report with recommendations and possible approaches to addressing issues and needs

Identify

Assess

Analyze

Page 30: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Identify

6 interviews with Graduate Students conducted in summer of 2011.

Developed Data Curation Profiles from these interviews.

Reviewed DCPs for needs.

Page 31: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Analyze

There is a lack of clear and shared expectations on how data should be documented, described and organized.

Locally – variation of practice by individual by circumstance, previous training / experience, intended use, etc.

Discipline – there is a lack of standards specifically for Agronomy data.

Page 32: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Analyze

Data are not being generated or processed in ways that could facilitate sharing externally, or even locally at Purdue or within the lab.

Inheriting data from previous graduate students was common and potentially problematic.

Many graduate students who had received data reported some problems understanding or making use of the data.

Page 33: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Analyze

Graduate Students stated that they lack knowledge and skills of how they should document, describe, organize and manage their data.

These activities tend to be done in relative isolation from the lab, or even the advisor.

Physical lab notebooks are still the primary means of documentation / provenance.

Page 34: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Assess

Page 35: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

DMP & DCP Connections

May uncover issues that merit further investigation through a DCP.

Uncovering data management issues could inform data management planning.

Page 36: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Another Case Study with DCPs

http://www.dlib.org/dlib/july13/wright/07wright.html

Page 37: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Thanks! Any Questions?

Jake Carlson Associate Professor of Library Science / Data

Services Specialist

Purdue University [email protected]

DMPTool Webinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS13 August 2013

Page 38: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

blog.dmptool.org/webinar-series

From Flickr by Jeff Keacher

In 2 weeks: Talking Points for Meeting with StakeholdersPresenter: Dan Phipps

Tuesday 27 Aug @ 10am PT

Register now!

Page 39: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

blog.dmptool.org/webinar-series/

Page 40: DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Email

TwitterBlog

Facebook

[email protected] [email protected] @TheDMPToolblog.dmptool.orgFacebook.com/DMPTool

Questions?