curating and managing research data for re-use review & processing jared lyle

52
Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Upload: jeffry-ferguson

Post on 11-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Curating and Managing Research Data for Re-Use

Review & ProcessingJared Lyle

Page 2: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

We Are Here Today: Review & Processing

Page 3: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

http://weknowmemes.com/2011/12/this-is-my-room-what-i-think-it-looks-like-what-my-mom-thinks-it-looks-like/

Page 4: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.

Do no harm.

Page 5: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf

Page 6: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Review

• Documentation• Data• [Disclosure Review]

Page 7: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Is the data collection complete, accurate, and well-documented?

Page 8: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Documentation

http://dx.doi.org/10.3886/ICPSR31521.v1

Page 9: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Essential Descriptive Elements

• Basic front matter• Variable level details• Methodology

Page 10: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Documentation: Front Matter

Title

Principal Investigator(s)

http://dx.doi.org/10.3886/ICPSR31521.v1

Page 11: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Description

Documentation: Front Matter

Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009. Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O'Malley, and John E. Schulenberg. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009 [Computer file]. ICPSR28401-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-10-27. doi:10.3886/ICPSR28401.v1

Page 12: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Documentation: Variable-level Details

National Longitudinal Study of Adolescent Health (Add Health), 1994-1995 (National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html

Page 13: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Variable Name

Documentation: Variable-level Details

Page 14: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Variable Label

Documentation: Variable-level Details

Page 15: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Variable Type

Documentation: Variable-level Details

Page 16: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Question Text

Documentation: Variable-level Details

Page 17: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Values

Documentation: Variable-level Details

Page 18: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Value Labels

Documentation: Variable-level Details

Page 19: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Missing Data

Documentation: Variable-level Details

Page 20: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Summary Statistics

Documentation: Variable-level Details

Page 21: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Constructed Variables

Documentation: Variable-level Details

Page 22: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Documentation: Variable-level Details

Skip Patterns

Notes

Page 23: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Documentation: Variable-level Details(examples)

American National Election Study, 2008-2009 Panel Study Frequency codebook, version 20090903. http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_fcodebook.txt

Page 24: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Documentation: Variable-level Details(examples)

Davis, James A., Tom W. Smith, and Peter V. Marsden. General Social Surveys, 1972-2008 [Cumulative File] [Computer file]. ICPSR25962-v2. Storrs, CT: Roper Center for Public Opinion Resarch, University of Connecticut/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-02-08. doi:10.3886/ICPSR25962

Page 25: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Documentation: Variable-level Details(examples)

United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2009 [Computer file]. ICPSR29621-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-11-16. doi:10.3886/ICPSR29621

Page 26: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Documentation: Variable-level Details(examples)

United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. Capital Punishment in the United States, 1973-2008 [Computer file]. ICPSR27982-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-09-07. doi:10.3886/ICPSR27982

Page 27: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

• Sample design: A description of how the cases that appear in the study were selected, including details about target populations, sampling frames, sample sizes, sampling errors, and sampling methods.

• Data collection procedures: The methods used to collect the data (e.g., telephone, mail, computer-assisted). Where applicable, this includes the exact instructions and protocols used by interviewers when they collected the data.

• Data processing: The activities and quality checks performed on the data collection to generate the final data products from the raw collected data. If files were merged , a full description of the process should be provided.

Documentation: Methodology

Page 28: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

• Weighting: Where applicable, a description of the criteria for using weights in the analysis of a data collection, including how the weights were created, all weighting formulae or coefficients, a definition of their elements, and an indication of how the formulae are applied to the data.

• Confidentiality issues: Where applicable, a discussion of any confidentiality issues in the data, as well as the steps taken to mitigate disclosure risk.

Documentation: Methodology

Page 29: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Other Documentation

• Questionnaire• User Guide• Handbook• Manual• Report• Table• User Agreement• Errata

Page 30: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Useful Resources: DescriptionICPSR, “What is a codebook?” http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-codebook

Institute for Health and Care Research Quality Handbook http://www.emgo.nl/kc/preparation/data%20collection/3%20Codebook.html Princeton University Data and Statistical Services, “How to Use a Codebook” http://dss.princeton.edu/online_help/analysis/codebook.htm UCLA Social Science Data Archive, “Codebooks” http://dataarchives.ss.ucla.edu/tutor/tutcode.htm

Page 31: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Data

Page 32: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Data Labels

• Does each variable have a variable name and label?

• Do all categorical variables have value labels?• Are labels consistent?

Page 33: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Naming Conventions: Variables

Variable Names:

•One-up numbers (V1, V2)•Question numbers (Q1, Q2)•Mnemonic names (age, race)•Prefix, root, suffix systems (FAED, MOED)

Page 34: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Naming Conventions: Variables

Variable Labels:

•Item/Question number•Indicate variable content•Indicate if variable constructed

Q14: Assessment of R’s Health

Page 35: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Naming Conventions: Values

Value Labels:

•Mutually exclusive, exhaustive, and defined•Preserve original information•Retain original coding scheme

Respondent’s Employment StatusSelf-employed (1)Somewhere-else (2) No answer (9)Not applicable (BK)

Page 36: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Missing Data

• Are there missing data?• Are missing data labeled?

77 = Inapplicable88 = Don’t Know99 = No Answer

Page 37: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Values

• Are the values reasonable (for example, date variables contain dates, gender variables don't have 10 categories, variables aren't all system missing)?

• Are there weight variables? If so, are they well documented?

Page 38: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Matching Data & Documentation

• Do the data match the documentation? Are values and/or labels listed in one but not in the other?

• Are all codes in the data valid (documented) according to the data collection instrument or PI's codebook?

• Are there duplicate records?• Does the spelling look OK?

Page 39: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Processing History

Page 40: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Useful Resources: DataUK Data Archive, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.data-archive.ac.uk/create-manage/document/data-level?index=1

ICPSR Guide to Social Science Data Preparation and Archiving: Phase 3: Data Collection and File Creation, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3quant.html

Page 41: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Activity

• Review the following data output and report any issues you find.

Page 42: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Examples of What to Look For:

42

Page 43: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

43

Examples of What to Look For:

Page 44: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

44

Examples of What to Look For:

Page 45: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

45

Examples of What to Look For:

Page 46: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

46

Examples of What to Look For:

Page 47: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

47

Examples of What to Look For:

Page 48: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

[Disclosure Review]

Page 49: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Discussion• How much cleaning do you do to a data

collection?• When is it appropriate to change the ‘original

order’ of a data collection?• How many processing details do you include

in the study documentation?

Page 50: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

Example: Review @ICPSR

Page 51: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.

Do no harm.

Page 52: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle

We Are Here Today: Review & Processing