amanda whitmire maura valentino osu libraries opp workshop series 5 december 2012

Post on 28-Dec-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Where’s Your Data?

Amanda WhitmireMaura Valentino

OSU Libraries

OPP Workshop Series5 December 2012

Why is a Librarian asking?

We are curious.

We manage information.

Data are a kind of

information.

TAKING CARE OF YOUR DATA

What’s your plan?

GOAL:

Achievable habits for implementing

data management best practices into your workflow

“…the recorded factual material commonly accepted in the scientific community

as necessary tovalidate research findings.”

Research data is:

U.S. Office of Management and Budget, Circular A-110

“…management activities required to maintain research

data long-termsuch that it is available for reuse and preservation.”

Data curation is:

Wikipedia

CURATION ≠ ARCHIVAL

“It is obvious that making data widely

available is an essential element of scientific research.”

Science editorial, “Making Data Maximally

Available,”11 Feb 2011

The case for data managementstewardship

curationetc.

$

Common missteps

“Why can’t I open this WordPerfect document?”“I think those data are on a ZipDisk somewhere…”“Oh, that dataset is on our group server…” “I never actually gave my advisor the final dataset…”“My laptop got stolen, so I lost the data…”“It was so long ago, I can’t remember …”

Research data lifecycle

New research question

posedResearch

planning & design

Data collection & description

Data processing &

analysisDissemination &

publication of findings

Data archiving

Accessible data located

Data transformed / repurposed

Research Cycle

How can we help?

New research question

posedResearch

planning & design

Data collection & description

Data processing &

analysisDissemination &

publication of findings

Data archiving

Accessible data located

Data transformed / repurposed

Research Cycle

Where to start?

How much data?

Resources needed

Roles & responsibilities

Metadata

Data formats

Data storage

Ethics & consent

Copyright (open data)

Sharing

Make a plan. Consider:

A fewtidbits

Data storage & curation

Anticipate: Volume/File type(s) Raw data vs. processed/analyzed data File Naming Conventions Privacy Concerns Storage practice Backup plans (LOCKSS, checksums)

File naming conventions

1. Be consistent• Have conventions for naming: (1) Directory structure

(2) Folder names(3) File names

• Always include the same information (e.g. date and time)• Retain the order of information (e.g. YYYYMMDD, not

MMDDYYY )

2. Be descriptive• Try to keep file and folder names under 32 characters

example: Project_instrument_location_YYYYMMDDhhmmss_extra.ext

SG157_20100426_001.raw (raw data)

SG157_20100426_001.mat (working data)

ESPOMZ_SG157_20100426_001.txt

(shareable)

Legal and ethical considerations

Intellectual property• Office for Commercialization & Corporate Development (OCCD)• Copyright

LicensingCharging for data?Data attribution & citation

Human subjects? Informed consent & anonymization prior to publishingResources @ OSU:• Office of Research Integrity, Institutional

Review Board (IRB)• Responsible Conduct of Research (RCR)

Program

Archiving and preservation

PoliciesPreservation optionsTypes of repositoriesCosts and benefits

University of SouthamptonSchool of Electronics & Computer ScienceSouthampton, UK, 2005

A word about backups…

Metadata

“The metadata accompanying your data should be written for a user 20 years into the future -- what does that person need to know to use your data properly? Prepare the metadata for a user who is unfamiliar with your project, methods, or observations.”

Oak Ridge National Laboratory Distributed Active Archive Center for Biogeochemical

Dynamics(ORNL DAAC)

What is Metadata?

Metadata is “data about data”

WHO created the data? WHAT is the content of the data? WHEN were the data created? WHERE is it geographically? HOW were the data developed? WHY were the data developed?

Metadata schemes

Dublin Core (DC), Darwin Core (DwC), EML, DDI, NBII,

FGDC/CSDGM, ISO 19139,

ISO 19115, DIF, LDIF, e-GMS,

AGLS, METS, MODS, PREMIS,

OAI-PMH, MARC, CDWA, CIDOC/CRM, DACS, DIG35,

GILS, GML, ISBD, LCSH, KML,

MARCXML, MEI, MODS, MIX,

OAIS, ANSI/NISO Z39.88, PB

Core, PRISM, QDC, RDF, SGML, VSO, XML, XMP

X

Metadata schemes

“Metadata schemes are like toothbrushes – everybody agrees that you should use one, but nobody wants to use someone else’s.”

You already use metadata…

-23

87

48

Metadata in use

State City Location Date Time Temperature (F)

Alaska Anchorage City Hall 2/12/2010 1400 -23

Florida Miami Weather Center 2/12/2010 1400 87

New York New York Empire State Building 2/12/2010 1400 48

Metadata in real life

You use it all the time…

Darwin Core | biological diversity, taxonomy

Dublin Core | general

DDI (Data Documentation Initiative) | social and

behavioral sciences data

DIF (Directory Interchange Format) |

environmental sciences

EML (Ecological Metadata Language) | ecology

FGDC/CSDGM (Federal Geographic Data

Committee/Content Standard for Digital

Geospatial Metadata) | geographic data

NBII (National Biological Information

Infrastructure) | biology

Major metadata standards

http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.10

Metadata activity!

Take it away, Maura…

Let’s Describe this Dataset

Bright orange Garibaldi fishHypsypops rubicundusCalifornia, USA

Ornate Butterfly fishChaetodon ornatissimusIndo-Pacific

Scenario 1

Research for preschoolers to see if they learn colors and

patterns better from real life examples

Scenario 2

Research on what fish are local to a particular area. The

photos are the data

Scenario 3

Research into specific details of specific types of fish

File/Folder Organization

You have monitors attached to 18 athletes (6 tennis players, 6 golfers, 6 rowers) for 7 days. Each day you get 2 readouts for each athlete, 1 for heart rate and 1 for body temperature. You transfer the data to Excel. Name and organize the files for this experiment.

Think about your own data– What types of data need to be described?

– What are the relationships between them?

– What descriptive metadata can you find?

– What metadata is being captured automatically?

– What other descriptive metadata do you need to help users find your data?

– What metadata do you need to help other scientists reproduce your data or use it for comparison?

– What events has/will the data undergo?

– For how long do you want to retain the data?

– How intensive are your preservation needs?

– How diverse is your user base? Does this influence your preservation needs?

Data Management Plans

Data Management Plans

The types of dataData & metadata standards | format

and content

Policies for access and sharingPolicies and provisions for re-usePlans for archiving data{Budget} $$$

Use available resources

http://www.dataone.org/data-management-planning

https://dmp.cdlib.org/

Contact information

Amanda Whitmire | Data Management

Specialist

amanda.whitmire@oregonstate.edu

Maura Valentino| Metadata Librarian

maura.valentino@oregonstate.edu

fin

top related