data management: documentation & metadata types of documentation

12
Data Management: Documentation & Metadata Types of Documentation

Upload: julianna-cora-garrett

Post on 22-Dec-2015

257 views

Category:

Documents


2 download

TRANSCRIPT

Data Management:Documentation & Metadata

Types of Documentation

2

Data Documentation (Metadata)

• Informal or formal methods to describe your data

• Important if you want to reuse your own data in the future

• Also necessary when sharing your data

3

You’re already documenting your data

• Notebook– Paper– Digital– Lab

• Folders with notes, text files• Sources, experiments or surveys, procedures,

etc.

4

Documentation in Research

Project Documentation Dataset Documentation

• Context of data collection• Data collection methods• Structure, organization of data files• Data sources used• Data validation, quality assurance• Transformations of data from the

raw data through analysis• Information on confidentiality,

access and use conditions

• Variable names and descriptions• Explanation of codes and schemas

used• Algorithms used to transform data• File format and software (including

version) used

5

Types of Documentation

Documentation for understanding & re-use•Readme File•Data Dictionary•Codebook

6

ReadMe

• Describes the core documentation about an investigation and its data files

• Typically a simple text file• Can describe the individual file(s) and/or data

package as a whole

7

ReadMe Example - Dataset

8

Data Dictionary

• Provides definitions of the data fields in a data file• More details on the variables, observations of a

file• Used to understand the data and the databases

that contain it• Identifies data elements and their attributes

including names, definitions and units of measure and other information

• Often they are organized as a table

9

Data Dictionary Example

10

What is a Codebook?• Typical in social sciences research• Includes elements similar to readme and

dictionary– Project level information (e.g. survey design and

methodology)– Response codes for each variable– Codes used to indicate nonresponse and missing

datahttp://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-codebook

11

What is a Codebook?

• Additionally, codebooks may also contain:– A copy of the survey questionnaire (if applicable)– Exact questions and skip patterns used in a survey– Frequencies of response

• Quite long!

http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-codebook

12

Other Examples of Data Documentation

• Lab notebooks• Software syntax• Programming code• Instrument settings and/or calibration• Provenance of sources of data• Embedded metadata (e.g. EXIF, FITS)