working with statisticians at some point, a statistician is likely to be asked to analyze your data....
TRANSCRIPT
![Page 1: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/1.jpg)
Working with Statisticians
At some point, a statistician is likely to be asked to analyze your data. This
can lead to much unhappiness.
![Page 2: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/2.jpg)
STATISTICIANS COME IN MANY SHAPES AND SIZES
![Page 3: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/3.jpg)
![Page 4: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/4.jpg)
BUT
![Page 5: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/5.jpg)
![Page 6: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/6.jpg)
Data formats
• Ideally, use a normalized database with validated data entry as part of LIMS…
• But 99% of the time => Excel spreadsheet• Some statisticians prefer to work with raw
data (i.e. FCS files) but not common– Scott will cover consistent annotation for raw data
at another lecture
![Page 7: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/7.jpg)
Basic principle #1
• Statisticians do not like Excel– The first thing they will try to do is export to a CSV
or delimited file, for import into SAS or R– If this is difficult to do, they will not like you
![Page 8: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/8.jpg)
Excel rules for happy statisticians
• 1 worksheet = 1 table• 1 cell = 1 value• Data?• Metadata?• Formatting?• Validation?
![Page 9: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/9.jpg)
1 worksheet = 1 table
• A table has column headers and a number of rows and nothing else – it is RECTANGULAR
• Do not put more than 1 table in a worksheet• Do not use non-rectangular tables• Example of good worksheet
![Page 10: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/10.jpg)
1 worksheet = 1 table
![Page 11: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/11.jpg)
1 cell = 1 value
• Easy to filter by tube, sample or subject• Easy to write validation rules or lookup table
![Page 12: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/12.jpg)
1 cell = 1 value
• ID column has 3 different values• Need to do text parsing to recover information
– very error prone
![Page 13: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/13.jpg)
Data: column names
• Consistent column names across worksheets– Singlets/Lymphocytes– Singlet/Lymphs– Singlets / Lymphocytes– Singlets/Lymphoctyes
• Use full gating path for column name– Singlets/Lymphocytes/Viable/CD4+/CM/IFN+
![Page 14: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/14.jpg)
Data: What to record
• Better to have more data than less data– Sample type (PBMC, whole blood)– Recovery – Viability
• Better to have basic than derived data– Counts better than relative frequencies
• Keep link to raw data for reproducibility– Path to FCS file on server
• Use special indicator for missing data (e.g. NAN), not zero• Can have extra column for notes
– Ideally codified so Error 23 rather than “Sample sat > 8 hours before processing”
![Page 15: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/15.jpg)
Data: Versioning
• Do not change the data in the worksheet once it has been handed to statistician.
• If there are errors that must be corrected, make a new copy, label the filename with date and version, and send that to statistician– ArcticRatExperiment_07May2013_Version01.xlsx– ArcticRatExperiment_17May2013_Version02.xlsx
![Page 16: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/16.jpg)
Metadata
• Should have SOP document for metadata– How missing data is represented (e.g. NA or blank)– Keys for interpretation – e.g. Table of error codes– Contact person: phone #, email– Metadata can be in 2nd worksheet or separate document
• Gating scheme with labeled gates matching cell subsets used in column names (PDF or PPT)
• Panel information– Antibodies, clones, batches, fluorochromes, peptide mixes
• Path to Flowjo .jo or .xml analysis file
![Page 17: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/17.jpg)
Metadata
• There are minimal information standards that should be followed– MiFlowcyte– MIATA
• Google for them if you’re not familiar with them – increasingly these are required by journals for publication, so worth making it an SOP for documentation of results
![Page 18: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/18.jpg)
Formatting
• Don’t do it.• Avoid putting information via:– Highlighting– Fancy spacing– Different fonts and font effects– Merging cells– Comments
• Will it survive a round-trip from Excel to CSV and back again?
![Page 19: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/19.jpg)
Formatting - Before
![Page 20: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/20.jpg)
Formatting - After
Comments are lostHighlighting is lostBad cell formatting is lostMerged cells become missing information
![Page 21: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/21.jpg)
Validation
• Can set up validation rules in Excel to minimize data entry errors:– Number range (0, 10000000)
• Can use lookup tables for codes to use– E.g. Error codes with explanation
• If possible, once format for data is decided, get local Excel wizard to create template and lookup rules
![Page 22: Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649c915503460f9494ba60/html5/thumbnails/22.jpg)
Questions?
• If no questions and need to kill time, watch Biologist talks to Statistician video– http://www.youtube.com/watch?v=Hz1fyhVOjr4