curating and managing research data for re-use review & processing jared lyle

Post on 11-Jan-2016

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Curating and Managing Research Data for Re-Use

Review & ProcessingJared Lyle

We Are Here Today: Review & Processing

http://weknowmemes.com/2011/12/this-is-my-room-what-i-think-it-looks-like-what-my-mom-thinks-it-looks-like/

A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.

Do no harm.

http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf

Review

• Documentation• Data• [Disclosure Review]

Is the data collection complete, accurate, and well-documented?

Documentation

http://dx.doi.org/10.3886/ICPSR31521.v1

Essential Descriptive Elements

• Basic front matter• Variable level details• Methodology

Documentation: Front Matter

Title

Principal Investigator(s)

http://dx.doi.org/10.3886/ICPSR31521.v1

Description

Documentation: Front Matter

Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009. Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O'Malley, and John E. Schulenberg. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009 [Computer file]. ICPSR28401-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-10-27. doi:10.3886/ICPSR28401.v1

Documentation: Variable-level Details

National Longitudinal Study of Adolescent Health (Add Health), 1994-1995 (National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html

Variable Name

Documentation: Variable-level Details

Variable Label

Documentation: Variable-level Details

Variable Type

Documentation: Variable-level Details

Question Text

Documentation: Variable-level Details

Values

Documentation: Variable-level Details

Value Labels

Documentation: Variable-level Details

Missing Data

Documentation: Variable-level Details

Summary Statistics

Documentation: Variable-level Details

Constructed Variables

Documentation: Variable-level Details

Documentation: Variable-level Details

Skip Patterns

Notes

Documentation: Variable-level Details(examples)

American National Election Study, 2008-2009 Panel Study Frequency codebook, version 20090903. http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_fcodebook.txt

Documentation: Variable-level Details(examples)

Davis, James A., Tom W. Smith, and Peter V. Marsden. General Social Surveys, 1972-2008 [Cumulative File] [Computer file]. ICPSR25962-v2. Storrs, CT: Roper Center for Public Opinion Resarch, University of Connecticut/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-02-08. doi:10.3886/ICPSR25962

Documentation: Variable-level Details(examples)

United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2009 [Computer file]. ICPSR29621-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-11-16. doi:10.3886/ICPSR29621

Documentation: Variable-level Details(examples)

United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. Capital Punishment in the United States, 1973-2008 [Computer file]. ICPSR27982-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-09-07. doi:10.3886/ICPSR27982

• Sample design: A description of how the cases that appear in the study were selected, including details about target populations, sampling frames, sample sizes, sampling errors, and sampling methods.

• Data collection procedures: The methods used to collect the data (e.g., telephone, mail, computer-assisted). Where applicable, this includes the exact instructions and protocols used by interviewers when they collected the data.

• Data processing: The activities and quality checks performed on the data collection to generate the final data products from the raw collected data. If files were merged , a full description of the process should be provided.

Documentation: Methodology

• Weighting: Where applicable, a description of the criteria for using weights in the analysis of a data collection, including how the weights were created, all weighting formulae or coefficients, a definition of their elements, and an indication of how the formulae are applied to the data.

• Confidentiality issues: Where applicable, a discussion of any confidentiality issues in the data, as well as the steps taken to mitigate disclosure risk.

Documentation: Methodology

Other Documentation

• Questionnaire• User Guide• Handbook• Manual• Report• Table• User Agreement• Errata

Useful Resources: DescriptionICPSR, “What is a codebook?” http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-codebook

Institute for Health and Care Research Quality Handbook http://www.emgo.nl/kc/preparation/data%20collection/3%20Codebook.html Princeton University Data and Statistical Services, “How to Use a Codebook” http://dss.princeton.edu/online_help/analysis/codebook.htm UCLA Social Science Data Archive, “Codebooks” http://dataarchives.ss.ucla.edu/tutor/tutcode.htm

Data

Data Labels

• Does each variable have a variable name and label?

• Do all categorical variables have value labels?• Are labels consistent?

Naming Conventions: Variables

Variable Names:

•One-up numbers (V1, V2)•Question numbers (Q1, Q2)•Mnemonic names (age, race)•Prefix, root, suffix systems (FAED, MOED)

Naming Conventions: Variables

Variable Labels:

•Item/Question number•Indicate variable content•Indicate if variable constructed

Q14: Assessment of R’s Health

Naming Conventions: Values

Value Labels:

•Mutually exclusive, exhaustive, and defined•Preserve original information•Retain original coding scheme

Respondent’s Employment StatusSelf-employed (1)Somewhere-else (2) No answer (9)Not applicable (BK)

Missing Data

• Are there missing data?• Are missing data labeled?

77 = Inapplicable88 = Don’t Know99 = No Answer

Values

• Are the values reasonable (for example, date variables contain dates, gender variables don't have 10 categories, variables aren't all system missing)?

• Are there weight variables? If so, are they well documented?

Matching Data & Documentation

• Do the data match the documentation? Are values and/or labels listed in one but not in the other?

• Are all codes in the data valid (documented) according to the data collection instrument or PI's codebook?

• Are there duplicate records?• Does the spelling look OK?

Processing History

Useful Resources: DataUK Data Archive, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.data-archive.ac.uk/create-manage/document/data-level?index=1

ICPSR Guide to Social Science Data Preparation and Archiving: Phase 3: Data Collection and File Creation, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3quant.html

Activity

• Review the following data output and report any issues you find.

Examples of What to Look For:

42

43

Examples of What to Look For:

44

Examples of What to Look For:

45

Examples of What to Look For:

46

Examples of What to Look For:

47

Examples of What to Look For:

[Disclosure Review]

Discussion• How much cleaning do you do to a data

collection?• When is it appropriate to change the ‘original

order’ of a data collection?• How many processing details do you include

in the study documentation?

Example: Review @ICPSR

A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.

Do no harm.

We Are Here Today: Review & Processing

top related