kathleen diviak, phd john o’keefe data management core ...root-key measured topic-educ status,...

22
Kathleen Diviak, PhD John O’Keefe Data Management Core Institute for Health Research and Policy University of Illinois at Chicago March 14, 2017

Upload: others

Post on 31-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Kathleen Diviak, PhDJohn O’Keefe

Data Management CoreInstitute for Health Research and Policy

University of Illinois at ChicagoMarch 14, 2017

Page 2: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Study Design

Collection

Storage:

Short & Long

Organizing

Documentation

Cleaning, Entry,

& Verification

Sharing &

Archiving

Page 3: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

https://www.youtube.com/watch?v=N2zK3sAtr-4

Page 4: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

How it Relates to Responsible Conduct of Research

Data Ownership Concerns who has the legal rights to the data and who retains the data after the project is completed, including the PI's right to transfer their data between institutions

Data Collection Concerns collecting data in a consistent, systematic manner throughout the project (reliability) and establishing an ongoing system for evaluating and recording changes to the project protocol (validity)

Data Storage Concerns the amount of data that should be stored - enough so that project results can be reconstructed

Data Protection Concerns protecting both written and electronic data from physical damage as well as damage to data integrity, including tampering or theft

Data Retention Concerns how long project data needs to be retained according to various sponsors' and funders' guidelines, and the importance of secure destruction of data

Data Analysis Concerns how raw data is chosen, evaluated, and interpreted into meaningful and significant conclusions that other researchers and the public can understand and use

Data Sharing Concerns how project data is disseminated to other researchers and the general public to share important or useful research results; also, when data should not be shared

Data Reporting Concerns publication of conclusive findings after the project is completed

Guidelines for Responsible Data Management in Scientific Research Course Materials From the Office of Research Integrity https://ori.hhs.gov/images/ddblock/data.pdf

Page 5: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable
Page 6: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Maintaining your data – access, security Maintain an understanding of your own data Ability to work with your research team,

critical for collaboration◦ Across disciplines, institutions

Aids sharing and archiving◦ Many funders and journals require some form of

data sharing◦ Archives have requirements and standards

Aids reproducibility: by you, by others Protection from allegations of scientific

misconduct

Page 7: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable
Page 8: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

What type of information and data will be stored? ◦ IRB, human subjects protections◦ Administrative or business files◦ Data

Who requires access to each category?◦ How sensitive is your research data? ◦ As much as possible limit access to identifiable

information

How does your research group, lab, or project operate?

Page 9: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Establish an hierarchal organization system and ensure all project staff understand the system ◦ Where are draft documents stored? Final versions?

Folder Names: major concepts, activities, functions, studies◦ Names should be self explanatory to all who use◦ Current versus OLD or Archived version folders

Depends on your lab, your group, type of study, cross-sectional vs. longitudinal (wide or long formats)

Page 10: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable
Page 11: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Naming◦ Establish a convention and stick to it – need buy in

from all study staff who name files ◦ Names should be self explanatory to all who access◦ Avoid duplication

Develop consistent plans for managing drafts, new versions, dates, & backups

Don’t save the same document in multiple locations

◦ Use the default extensions in your file names◦ Use file logs that describe key changes to file

versions over time◦ Include enough detail to distinguish each file from

another but not enough that the length is unwieldy

Page 12: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Variables – names and labels, units of measurement, descriptions, data types

Systematic variable naming convention◦ Mnemonic-MomEduc, DadEduc◦ Question/variable numbers-Q1, Q2, Q3a, Q3b or v1, v2, v3 ◦ Prefix-root-suffix (mixed cases & underscores helpful)

Prefix-wave, appointment, unit of analysis, computed items Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable

Variable labels can contain exact question wording and question number, or may have to truncate

Response values- 0=no & 1=yes or was it 1=no & 2=Yes Missing values–describe why missing◦ Left blank, not collected, cannot understand, not applicable, etc.

Some programs allow custom defined variables attributes Branching logic/Skip rules

Page 13: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Always keep the raw data ◦ for access by others make the files “read only”

Always use scripts, syntax, or code to track changes to your data, construction of scales, & run analyses

While you can track changes via notes and construct scales/run analyses via point/click scripts, syntax, & code provide easy documentation, can be easily revised and re-run (saving time), protect against research misconduct

Scale construction, data transformations ◦ Always retain original files/variables & distinguish

computed or recoded variables from raw versions

Page 14: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Syntax, code, and scripts are key to good documentation◦ Header section

Project name & grant number

PI & Coder/Analyst name

Software name, version

Denote key sections of code if very involved and long

Which file(s) – input data, output data, output/report Inline documentation - comments throughout, e.g., removal of outliers, recoding cleaned responses, logic used

◦ Operations-transformations (recodes), computations (scales), subset, weights, & analyses

Page 15: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Metadata – “a set of data that describes and gives information about other data”

Good documentation also describes your study and its data, in context ◦ details that explain the context for others (e.g.

smoking cessation intervention for parents of pediatric asthma ER patients)

May include: a readme file, data dictionary, codebooks, protocols, file logs, lab notebook, and other documentation that helps you and others understand and make sense of your data in the short and long term

Page 16: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Data Collection methods and sources used◦ interviews, chart review, observation, surveys

How are the files organized/structured

Quality assurance: what quality control and data validation methods were used

File formats, software used, which version

Ethics – confidentiality, who can access, conditions of use

Page 17: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Reduce risk of damage or loss◦ What are you storing (e.g., paper, electronic, biological

samples)◦ Limit access – whose position requires access◦ Know your departmental resources and policies◦ Data saved in multiple locations

3 is recommended (2 local and 1 remote location) Remember human subjects protections and how/where your

IRB application said you would store data◦ Know your backup schedule

Network, done for you or do you need to initiate backups ◦ Use a reliable medium

USB drive are easily lost or damaged ◦ Test your backups, systems for deleting older backups◦ Include processes in work flow

Page 18: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Confidential or Sensitive Data◦ Where is this stored, how long do you need access

to the data, who requires access, when can it be de-identified

Physical security – network data stored, other mediums – videos, audio, images, etc.

Passwords/Keys

Staff training – MOST CRITICAL◦ Plans only work when followed, people do

unexpected things/make mistakes, include all staff handling data (remember volunteers)

Page 19: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

To ensure access to your data in the future:

Non-proprietary

Unencrypted & uncompressed

Standard representation (ASCII, Unicode)

Formats: .txt, .csv, .pdf, .jpg, .tiff ◦ What is the common for your research area,

software used, etc.

Page 20: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Expectation from both Funders & Journals ◦ Replicate and verify your results◦ Allow future research – new research questions,

new analyses, combined datasets◦ IRB, human subjects considerations need to be

addressed but do not limit data sharing ◦ Think through all privacy and confidentiality issues

with your data and address before sharing/archiving

◦ What aspects should be shared? Which destroyed? small geographic areas, rare populations, linked

datasets,

Page 21: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

Kathi Diviak, John O’Keefe, & Dick Campbell

We provide free consultation to IHRP affiliated researchers ◦ develop a data management plan for proposals◦ help plan data collection strategies and approaches◦ develop budget estimates for data

collection/processing ◦ recommendations for documenting, organizing,

storing, sharing, and archiving your research data.

Email [email protected] to schedule a meeting

Page 22: Kathleen Diviak, PhD John O’Keefe Data Management Core ...Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable Variable

We can provide range of data management services to IHRP affiliated researchers including:◦ research protocol development◦ web-based survey development, data entry and

processing◦ staff training ◦ assistance preparing data for archiving

Costs for these services can be written into grant proposal budgets or charged to existing grants.

Email [email protected] for more information