agile curation: 2015 agu presentation

20
Agile Data Curation: A Conceptual Framework and Approach for Practitioner Data Management Presenting Author: Josh Young 1 Co-Authors: Karl Benedict 2 and Christopher Lenhardt 3 1. University Corporation for Atmospheric Research (UCAR) Unidata Program Center, Boulder, U 3. Renaissance Computing Institute (RENCI), University of North Carolina at Chape 2. University of New Mexico, Albuquerque USA

Upload: josh-young

Post on 11-Apr-2017

401 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Agile Curation: 2015 AGU Presentation

Agile Data Curation: A Conceptual Framework and Approach for Practitioner

Data Management

Presenting Author: Josh Young1

Co-Authors: Karl Benedict2 and Christopher Lenhardt31. University Corporation for Atmospheric Research (UCAR) Unidata Program Center, Boulder, USA

3. Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, USA2. University of New Mexico, Albuquerque USA

Page 2: Agile Curation: 2015 AGU Presentation

ScopeImagine a project:• that includes a well-thought out and documented

data management plan, • and robust implementation of that plan through

out the project and beyond. • This talk is not for that project; it is for the rest of

us.

Page 3: Agile Curation: 2015 AGU Presentation

So why do we care about data management?

• Internal reasons: do good research, write papers, get tenure, win more grants.

• External reasons: public access & reproducibility Risk of becoming dark data (Heidorn,

2008)

Page 4: Agile Curation: 2015 AGU Presentation

Why care about external access?• Intangibles for an Investigator

• Maybe someday I’ll benefit from someone else’s data• Maybe I’ll learn something through informal dialogue• Most science funding is from public resources and should/could be

considered a public trust resource• Peer pressure

• Tangibles for an Investigator• Increased efficiency• My funders require it.

Page 5: Agile Curation: 2015 AGU Presentation

So why do we care about data management?

• Internal reasons: do good research, write papers, get tenure, win more grants.

• External reasons: greater impactAgile Curation

Page 6: Agile Curation: 2015 AGU Presentation

Workflows Internal

Page 7: Agile Curation: 2015 AGU Presentation

Public-Access Workflows

Page 8: Agile Curation: 2015 AGU Presentation

Agile Curation:• Means taking implementable steps to

improve data management for external access.

• Philosophically, it attempts to apply lessons from agile software development to data management.

Page 9: Agile Curation: 2015 AGU Presentation

Agile Curation Principles, 2nd Generation

1) Delivery, access, use and citation of research data are the primary measures of success.

2) Maximize the impact of research data through the continuous integration of curation activities

3) Support unanticipated needs for and uses of research data (and documentation) and develop flexible systems to capture new uses.

Page 10: Agile Curation: 2015 AGU Presentation

Agile Curation Principles, 2nd Generation

4) Make data open and accessible as early in the process as possible.

5) Encourage crowd-sourced / community feedback to improve and enhance the data. Provide basic metadata for data available early in the process even if the data are not finalized.

6) Identify key individuals in a research project that have the requisite motivation, knowledge, or ability to learn and get out of their way.

Page 11: Agile Curation: 2015 AGU Presentation

Agile Curation Principles, 2nd Generation continued

7) Data creators and data curators should work closely throughout the data life story to ensure the most efficient and streamlined process.

8) Identify the most effective method(s) for maintaining close communication between the data creators and curators involved and use them.

9) Target the steady delivery of incremental improvements to research data discovery, access and use that is consistent with a sustainable level of effort and available funding.

Page 12: Agile Curation: 2015 AGU Presentation

Agile Curation Principles, 2nd Generation continued

9) Start with the basics and only make systems more complex as needed, while maintaining a low bar to entry.

10)Continuous attention to technical excellence and good design enhances agility.

11)Continuously develop a community of data providers, curators and users that participate in the evolution of the research data systems.

Page 13: Agile Curation: 2015 AGU Presentation

What happens next?• Case Studies documentation:

To clarify and/or verify these principles To provide workflow examples that can

be adopted or revised for reuse• Nascent community of interest within

the Research Data Alliance

Page 14: Agile Curation: 2015 AGU Presentation

ScopeImagine a project:• that includes a well-thought out data

management plan, • and robust implementation of that plan through

out the project. • This talk is not for that project; it is for the rest of

us.

Page 15: Agile Curation: 2015 AGU Presentation

Unidata is one of the University Corporation for Atmospheric Research (UCAR)'s Community Programs (UCP), and is funded primarily by the National

Science Foundation (Grant NSF-1344155).

Page 16: Agile Curation: 2015 AGU Presentation

Questions?

Contact me at: [email protected] @unidata_josh 303-497-8646

Page 17: Agile Curation: 2015 AGU Presentation

Background

Page 18: Agile Curation: 2015 AGU Presentation

Agile Curation Principles, 1st Generation

1) Access to data is the first goal;2) Generative value is supported (Zittrain, 2006)3) Researcher involvement through a participatory

framework that aligns data management with scientific research processes (Yarmey and Baker, 2013)

4) Projects will utilize free open-source resources to the greatest extent practical;

5) Community participation increases project capacity;

Josh Young
Based on 2014 poster
Page 19: Agile Curation: 2015 AGU Presentation

Agile Curation Principles, 1st Generation part 2

6) Data management requirements and practices evolve as the research project proceeds;

7) Bright and dedicated individuals can learn appropriate skills and respond to the demands of their particular project, as they proceed;

8) Approaches apply across scales9) Consider technical debt10) Data evaluation can be conducted through use and

feedback;

Page 20: Agile Curation: 2015 AGU Presentation

How we got here• Idea formulated during discussion of Data

Management Lifecycles at GeoData 2014• Principles drafted for AGU 2014• Two Research Data Alliance (RDA) Birds of

a Feather sessions to explore community experiences