kathleen diviak, phd john o’keefe data management core ...root-key measured topic-educ status,...
TRANSCRIPT
Kathleen Diviak, PhDJohn O’Keefe
Data Management CoreInstitute for Health Research and Policy
University of Illinois at ChicagoMarch 14, 2017
Study Design
Collection
Storage:
Short & Long
Organizing
Documentation
Cleaning, Entry,
& Verification
Sharing &
Archiving
https://www.youtube.com/watch?v=N2zK3sAtr-4
How it Relates to Responsible Conduct of Research
Data Ownership Concerns who has the legal rights to the data and who retains the data after the project is completed, including the PI's right to transfer their data between institutions
Data Collection Concerns collecting data in a consistent, systematic manner throughout the project (reliability) and establishing an ongoing system for evaluating and recording changes to the project protocol (validity)
Data Storage Concerns the amount of data that should be stored - enough so that project results can be reconstructed
Data Protection Concerns protecting both written and electronic data from physical damage as well as damage to data integrity, including tampering or theft
Data Retention Concerns how long project data needs to be retained according to various sponsors' and funders' guidelines, and the importance of secure destruction of data
Data Analysis Concerns how raw data is chosen, evaluated, and interpreted into meaningful and significant conclusions that other researchers and the public can understand and use
Data Sharing Concerns how project data is disseminated to other researchers and the general public to share important or useful research results; also, when data should not be shared
Data Reporting Concerns publication of conclusive findings after the project is completed
Guidelines for Responsible Data Management in Scientific Research Course Materials From the Office of Research Integrity https://ori.hhs.gov/images/ddblock/data.pdf
Maintaining your data – access, security Maintain an understanding of your own data Ability to work with your research team,
critical for collaboration◦ Across disciplines, institutions
Aids sharing and archiving◦ Many funders and journals require some form of
data sharing◦ Archives have requirements and standards
Aids reproducibility: by you, by others Protection from allegations of scientific
misconduct
What type of information and data will be stored? ◦ IRB, human subjects protections◦ Administrative or business files◦ Data
Who requires access to each category?◦ How sensitive is your research data? ◦ As much as possible limit access to identifiable
information
How does your research group, lab, or project operate?
Establish an hierarchal organization system and ensure all project staff understand the system ◦ Where are draft documents stored? Final versions?
Folder Names: major concepts, activities, functions, studies◦ Names should be self explanatory to all who use◦ Current versus OLD or Archived version folders
Depends on your lab, your group, type of study, cross-sectional vs. longitudinal (wide or long formats)
Naming◦ Establish a convention and stick to it – need buy in
from all study staff who name files ◦ Names should be self explanatory to all who access◦ Avoid duplication
Develop consistent plans for managing drafts, new versions, dates, & backups
Don’t save the same document in multiple locations
◦ Use the default extensions in your file names◦ Use file logs that describe key changes to file
versions over time◦ Include enough detail to distinguish each file from
another but not enough that the length is unwieldy
Variables – names and labels, units of measurement, descriptions, data types
Systematic variable naming convention◦ Mnemonic-MomEduc, DadEduc◦ Question/variable numbers-Q1, Q2, Q3a, Q3b or v1, v2, v3 ◦ Prefix-root-suffix (mixed cases & underscores helpful)
Prefix-wave, appointment, unit of analysis, computed items Root-key measured topic-Educ Status, alcohol use Suffix-item number if part of scale or version number of variable
Variable labels can contain exact question wording and question number, or may have to truncate
Response values- 0=no & 1=yes or was it 1=no & 2=Yes Missing values–describe why missing◦ Left blank, not collected, cannot understand, not applicable, etc.
Some programs allow custom defined variables attributes Branching logic/Skip rules
Always keep the raw data ◦ for access by others make the files “read only”
Always use scripts, syntax, or code to track changes to your data, construction of scales, & run analyses
While you can track changes via notes and construct scales/run analyses via point/click scripts, syntax, & code provide easy documentation, can be easily revised and re-run (saving time), protect against research misconduct
Scale construction, data transformations ◦ Always retain original files/variables & distinguish
computed or recoded variables from raw versions
Syntax, code, and scripts are key to good documentation◦ Header section
Project name & grant number
PI & Coder/Analyst name
Software name, version
Denote key sections of code if very involved and long
Which file(s) – input data, output data, output/report Inline documentation - comments throughout, e.g., removal of outliers, recoding cleaned responses, logic used
◦ Operations-transformations (recodes), computations (scales), subset, weights, & analyses
Metadata – “a set of data that describes and gives information about other data”
Good documentation also describes your study and its data, in context ◦ details that explain the context for others (e.g.
smoking cessation intervention for parents of pediatric asthma ER patients)
May include: a readme file, data dictionary, codebooks, protocols, file logs, lab notebook, and other documentation that helps you and others understand and make sense of your data in the short and long term
Data Collection methods and sources used◦ interviews, chart review, observation, surveys
How are the files organized/structured
Quality assurance: what quality control and data validation methods were used
File formats, software used, which version
Ethics – confidentiality, who can access, conditions of use
Reduce risk of damage or loss◦ What are you storing (e.g., paper, electronic, biological
samples)◦ Limit access – whose position requires access◦ Know your departmental resources and policies◦ Data saved in multiple locations
3 is recommended (2 local and 1 remote location) Remember human subjects protections and how/where your
IRB application said you would store data◦ Know your backup schedule
Network, done for you or do you need to initiate backups ◦ Use a reliable medium
USB drive are easily lost or damaged ◦ Test your backups, systems for deleting older backups◦ Include processes in work flow
Confidential or Sensitive Data◦ Where is this stored, how long do you need access
to the data, who requires access, when can it be de-identified
Physical security – network data stored, other mediums – videos, audio, images, etc.
Passwords/Keys
Staff training – MOST CRITICAL◦ Plans only work when followed, people do
unexpected things/make mistakes, include all staff handling data (remember volunteers)
To ensure access to your data in the future:
Non-proprietary
Unencrypted & uncompressed
Standard representation (ASCII, Unicode)
Formats: .txt, .csv, .pdf, .jpg, .tiff ◦ What is the common for your research area,
software used, etc.
Expectation from both Funders & Journals ◦ Replicate and verify your results◦ Allow future research – new research questions,
new analyses, combined datasets◦ IRB, human subjects considerations need to be
addressed but do not limit data sharing ◦ Think through all privacy and confidentiality issues
with your data and address before sharing/archiving
◦ What aspects should be shared? Which destroyed? small geographic areas, rare populations, linked
datasets,
Kathi Diviak, John O’Keefe, & Dick Campbell
We provide free consultation to IHRP affiliated researchers ◦ develop a data management plan for proposals◦ help plan data collection strategies and approaches◦ develop budget estimates for data
collection/processing ◦ recommendations for documenting, organizing,
storing, sharing, and archiving your research data.
Email [email protected] to schedule a meeting
We can provide range of data management services to IHRP affiliated researchers including:◦ research protocol development◦ web-based survey development, data entry and
processing◦ staff training ◦ assistance preparing data for archiving
Costs for these services can be written into grant proposal budgets or charged to existing grants.
Email [email protected] for more information