2016 ocean sciences meeting tutorial

38
Moving Beyond Planning to Implementation: Open-Source Tools… Josh Young Ocean Sciences Meeting February 24, 2016

Upload: josh-young

Post on 11-Apr-2017

134 views

Category:

Science


0 download

TRANSCRIPT

Moving Beyond Planning to Implementation: Open-Source Tools…

Josh YoungOcean Sciences Meeting

February 24, 2016

Who is Unidata?

Why at Ocean Sciences?

ScopeImagine a project:• that includes a well-thought out and

documented data management plan, • and robust implementation of that

plan through out the project and beyond. • This talk is not for that project; it is

for the rest of us.

So why do we care about data management?

• Internal reasons: do good research, write papers, get tenure, win more grants.

• External reasons: public access & reproducibility Risk of becoming dark data

(Heidorn, 2008)

Why care about external access?• Intangibles for an Investigator• Maybe someday I’ll benefit from someone else’s data• Maybe I’ll learn something through informal dialogue• Most science funding is from public resources and

should/could be considered a public trust resource• Peer pressure

• Tangibles for an Investigator• Increased efficiency• My funders require it.

So why do we care about data management?

• Internal reasons: do good research, write papers, get tenure, win more grants.

• External reasons: greater impact

Workflows Internal

Public-Access Workflows

What is the DMRC & do we really need another Data Plan Project?

• Probably not• The DMRC is not a Data Plan tool• Unidata community requested help

with implementation• Therefore, the DMRC is primarily a

curated list of tools for implementation

The DMRC

What the DMRC Offers• Highlights requirements from

funding agencies;• Points to Best Practices

developed by others in the Data Management space;

• Sorts available tools by best practice;

• Details available tools.

Requirements• Highlight data management

funding requirements from NASA, NOAA, NSF• These are the agencies that fund

our community so we try to stay up to date, but remember the agency posted information is always the authority

Activity Best Practices & Possible Tools

Activity column based on DataOne Best Practices

The DMRC Points to Tools

The DMRC Points to Tools

The DMRC Points to Tools

The DMRC Explains the LDM

The DMRC Explains the TDS

The DMRC Explains RAMADDA

What We Are Exploring• Dataverse by Harvard • Designed for sharing, archiving,

and citing data• Allows you to create a DOI• Allows you to store and make

data accessible in perpetuity

What We Are ExploringKnown Dataverse Characteristics:• Largest single file limited to 10GB• No limit to number of files• Users create their own Dataverse• Designate private or public• Open to data from all science disciplines• Does not corrupt at least some software

files (e.g. IDV bundles)• FREE

What We Are ExploringPossible Dataverse Contributions:• Description (providing DOIs)• Sharing (access for perpetuity) • Preservation (static copy for

perpetuity)• Cost (free) very suitable for projects

that might otherwise become long-tail data

Activity Best Practices & Possible Tools

Activity column based on DataOne Best Practices

Open Source Access to Code

We Welcome Your Resource Suggestions!

• Please visit: http://goo.gl/forms/Ngp4Xu9nGr

Example Workflow Implementation

• Radar and Lidar data from the University of Wyoming King Air

• Millersville University Plains Elevated Convection at Night (PECAN) data

• North Carolina State University WRF North Atlantic Model Outputs

?

Part of a larger effort: Agile Data Curation

• Means taking implementable steps to improve data management for external access.

• Philosophically, it attempts to apply lessons from agile software development to data management.

Agile Curation Principles, 2nd Generation

(J.Young, K.Benedict, & C. Lenhardt, AGU 2015 Fall Meeting)

1) Delivery, access, use and citation of research data are the primary measures of success.

2) Maximize the impact of research data through the continuous integration of curation activities

3) Support unanticipated needs for and uses of research data (and documentation) and develop flexible systems to capture new uses.

Agile Curation Principles, 2nd Generation

4) Make data open and accessible as early in the process as possible.

5) Encourage crowd-sourced / community feedback to improve and enhance the data. Provide basic metadata for data available early in the process even if the data are not finalized.

6) Identify key individuals in a research project that have the requisite motivation, knowledge, or ability to learn and get out of their way.

Agile Curation Principles, 2nd Generation continued

7) Data creators and data curators should work closely throughout the data life story to ensure the most efficient and streamlined process.

8) Identify the most effective method(s) for maintaining close communication between the data creators and curators involved and use them.

9) Target the steady delivery of incremental improvements to research data discovery, access and use that is consistent with a sustainable level of effort and available funding.

Agile Curation Principles, 2nd Generation continued

10) Start with the basics and only make systems more complex as needed, while maintaining a low bar to entry.

11)Continuous attention to technical excellence and good design enhances agility.

12)Continuously develop a community of data providers, curators and users that participate in the evolution of the research data systems.

We Welcome Your Stories• Please email: [email protected]

Balancing infrastructure development & scientific advancement to create sustainable, multidisciplinary solutions

M. Chan

• Advance science• Meet grand challenges• Leverage shared

cyberinfrastructure technology

NSF’s EarthCube

CyberInfrastructure

Science

RCNsBuildingBlocks

InteractiveActivities

End UserWorkshops EC

Committees

GOALS

Get Involved!Science

Committee

Technology & Architecture Committee

Liaison Team

LEADERSHIP

COUNCILOffice

Council of Data

Facilities

Engagement Team

• Talk to EarthCube Participants!

• Attend EarthCube Workshops!

• Join the mailing list at earthcube.org

• Apply for funding (EC Travel Grants, Distinguished Lecturers)

• Follow on twitter @earthcube

Unidata is one of the University

Corporation for Atmospheric Research (UCAR)'s Community Programs (UCP), and is funded

primarily by the National Science Foundation (Grant NSF-1344155).