what you need to know about workflow_ykim_2013feb.pdf · fix it later via “logic chk”heck”...
TRANSCRIPT
What You Need to Know aboutWhat You Need to Know about Project Workflow
Presented by: Yoonsang Kim, PhD
Wi h ib i bWith contributions by:Mike Berbaum, PhDOksana Pugach, PhD
IHRP Methodology Research Core Chalk Talk, Feb. 12, 2012
D t t• Data managementdata entry, data cleaning, logic check, data dictionary
• Folder managementFolder managementfolder structure, file naming scheme, file ddescription
• Article sharingg• Digital repository
Data Management1 Data Entry1. Data Entry
• Decide how to enter the data in e‐form: MS ec de o to e te t e data e o : SAccess, REDCap, Epi Info, SPSS, Tele form
• Double‐Entryy• Set up some rules:
1. How to enter missing values (e.g, “.” or “‐99”)2. Do not enter text as a value if can avoid (e.g., for a
dichotomous choice, enter 1 and 0)3 Data di tionar ( ersion 1)3. Data dictionary (version 1)
• Take a note if find inconsistent answers, possible errors etc (e g skip pattern)errors, etc. (e.g., skip pattern)
MS Access database
Data Management2 Data Cleaning2. Data Cleaning
• Double‐entry checkDouble entry check• Are the variables in the possible range? (e.g., age = 10)age = ‐10)
• Duplicate IDs? Missing IDs?• Program to prevent from entering implausible values; available in MS Access, REDCap
Data Management3 Logic Check3. Logic Check
• Skip pattern in the surveySkip pattern in the survey• Any inconsistent answers in related questions?
f i i f d hil d• If inconsistent answers found while data entry – One suggestion: enter the values marked in the
l “l h k”survey. Fix it later via “logic check”– Another suggestion: program before data entry to
tprevent
• Make a decision and stick to it
Data Management
• Keep the original/raw data & never overwriteKeep the original/raw data & never overwrite• Keep the data sets in each and every step• Keep all relevant codes and programsKeep all relevant codes and programs• Naming scheme for data files: e.g.,
dataname R for the raw datadataname_R for the raw datadataname_C for cleaned datadataname L for data after logic checkdataname_L for data after logic check
• Data Merge – keep the unique IDs in the same format (numeric or character) across all data sets( )
Data Management
SAS Enterprisep• Import data by drag and drop in the Process Flow window
• While reading the data, can modify the variable names and the formats
• Merge data sets using the Query Builder (same as• Merge data sets using the Query Builder (same as PROC SQL)
• Creating/recoding variables resembles SPSSg/ g• Shows “process flows” including reading in and merging data and computing statistics
SAS Enterprise Process Flow
SAS Enterprise Guide
• Using the SAS Enterprise Guide (Standford Univ)g p ( )http://www.stanford.edu/group/ssds/cgi‐
bin/drupal/files/Guides/Using_the_SAS_Enterprise_Guide_4.2_2011.pdf
I t t E t i G id (M i h I St t U i )• Intro to Enterprise Guide (Marasinghe, Iowa State Univ)http://www.stat.iastate.edu/homepages/mervyn/SAS_Works
hop/SAS%20Short%20Course EG 1 2012.pdfhop/SAS%20Short%20Course_EG_1_2012.pdfhttp://www.stat.iastate.edu/homepages/mervyn/SAS_Works
hop/SAS%20Short%20Course_EG_2_2012.pdf
Data Management4 Data Dictionary4. Data Dictionary
• Set up a file containing the short descriptionsSet up a file containing the short descriptions for variables and labels for values
• SPSS REDCap Epi Info create data dictionary• SPSS, REDCap, Epi Info – create data dictionary• SAS – PROC FORMAT
1. Permanent format – keep the format with the data (version conflict issue)
2. Temporary format – keep the program of PROCFORMAT
An Example of Data Dictionary
Folder Management
Folder HierarchyOne exampleOne example
Investigatorg
Project 1 Project 2
Data Programs Results Grant Documents
Literature
Key: Build your own structure and stick to itTip: Do not make too many sub‐folders
Organizing FilesUse “header”
Organizing FilesHeader for a analysis report
Organizing FilesHeader for a analysis report
Organizing Files• Build your own file naming scheme or within a groupgroup
• Useful for writing/updating a grant proposal and manuscriptmanuscript
• ExamplesProjectname mm dd yy docxProjectname_mm‐dd‐yy.docxProjectname_yyyy‐mm‐dd.docxProjectname_YK_mm_dd_yy.docx
Organizing Files• Keep a file that describes files
Organizing Files
Hyperlink to project websitesyp p jHyperlink to the file
Organizing Articles
• EndNote Reference Manager ZoteroEndNote, Reference Manager, Zotero(www.zotero.org), BibTeX (www.ctan.org)
• A group library is useful• A group library is useful– I‐drive– Online group libraryOnline group library– Dropbox (www.dropbox.com)– Google Docs/Google Drive (drive.google.com, search doc with key words)
Dissemination
• Digital repository – UIC Indigo (indigo uic edu)Digital repository UIC Indigo (indigo.uic.edu)Deposit (un)published research, working papers, technical reports, dissertations, etc.p , ,
http://indigo.uic.edu/handle/10027/7261
• DataData– PHI included? HIPPA compliant storageDe Identify before dissemination– De‐Identify before dissemination