organizing your data - university of tennessee college of...
Post on 11-Jun-2018
213 Views
Preview:
TRANSCRIPT
Organizing Your Data
Jenny Holcombe, PhD
UT College of Medicine Nuts & Bolts Conference
August 16, 3013
Learning Objectives
Identify Different Types of Variables
Appropriately Naming Variables
Constructing a Variable Code Book
Developing Excel Spreadsheets & Effectively Entering Data
Identifying Differences between: Descriptive & Inferential Statistics
Parametric & Nonparametric Statistics
2
Variables
A characteristic or condition that changes or has different values for different individuals
Anything that can be measured
Operational Definition – a definition of the variable in terms of how, specifically, it is to be measured
4
Qualitative Variables
Differ in kind rather than amount
Differ in quality, not quantity or magnitude
Also referred to as categorical or nominal
Examples – favorite color, treatment group, gender, race
5
Quantitative Variables
Assigned number values that represent differing quantities of the characteristics
Examples – medication dosage, # of doctor visits, annual income
Quantitative data can either be: Discrete – a finite number of values (i.e., # of
doctor visits last year)
Continuous – infinite continuum of possible real number values (i.e., # of minutes it takes to finish a book)
6
Quantitative Variables
Three types of quantitative variables: Ordinal – categorical scales that have a natural
ordering of values (i.e., SES Class – low, middle, high)
Interval – distances between adjacent scores are equal & consistent throughout the scale with no absolute zero point (i.e., IQ scores, temperature)
Ratio – same as interval with a true zero point (i.e., length, distance, time)
7
Variables – Final Points
It is possible to measure data on more than one scale
Variables should always be measured on the highest scale possible Ratio
Interval
Ordinal
Nominal
8
Fictitious Data
Four measurement levels for daily amount of sodium intake
9
Participant Ratio - Actual mg
Interval - Values above 2500mg
Ordinal - Rank order
Nominal - 1=not high 2=high
Alan 4000 1500 3 1
Nathan 7500 5000 6 2
Chris 2500 0 1 1
Mike 3500 1000 2 1
Vadim 6000 3500 5 2
Daniel 5500 3000 4 2
Source: Polit, D. F. (2010). Statistics and data analysis for nursing research. (2nd ed.). Boston: Pearson. ISBN: 78-0135085073
Naming Variables
The first row should include variable names this makes transfer to other
programs easier (i.e., SPSS, SAS)
Variable names can be up to 32 characters in length but anything more than 8-12 becomes very cumbersome to manage
Each variable name must be unique; duplication is not allowed & names are not case sensitive 11
Naming Variables
Variable names should begin with a letter
Avoid periods, #, @, $, and only use underscores within the variable name (not at the beginning or end)
No spaces are allowed in variable names
Use meaningful names for variables Makes variables more self explanatory
Some exceptions – balance length/meaning
12
Naming Variables
Acceptable Names
Q1; Q_1
Question1; Question_1
Q1_food
Food
DRS1; DRS_1
Unacceptable Names
Q 1; 1Q; Q-1
Question 1; Question-1
Q1 food; Q1-food
_Food_
DiabetesRiskScale1
The main thing is to be consistent when naming variables
13
Naming Variables
Examples…
http://www.ciser.cornell.edu/images/Excel2SASa.gif
What is wrong with this file?
14
Variable Code Books
Purpose: To create a data entry system
To assist with data entry
For statistical analysis
When archiving data files for follow-up
16
Code Book Construction
Elements to include:
1) Description of the Study
2) Sampling Information
3) Technical Information
4) Structure of the Data
Variable Name
Variable Label
Value Labels
5) Text of the Questions/Survey Instrument
17
Code Book Construction
Word or Excel format is acceptable
A columned list or table is acceptable
All variables should be included with appropriate labeling information
Variable labels can be any length but no longer than 256 characters is recommended
The variable labels can contain spaces & characters not allowed in variable names
18
Code Book – Final Points
Be consistent in your coding!
Update the code book as you enter your data – if you make a change while entering your data, make sure you update your code book as well
Check & double check – your code book acts as a form of communication between you and your data analyst (and possibly
between you & your future self!)
20
Proper Data Layout
Allows you the ability to: Combine data
Separate data
Create charts that give insight into what the raw data has to say
When you enter your data without consideration of how you will use the data later, it becomes much more difficult to conduct any data analysis
22
Excel Basics
Each individual row of data is known as a record, an observation, a case Do not leave any blank rows
There cannot be information about an item in more than one row
Each column is a field labeled to identify the data it contains All data in each column should be formatted the
same
Do not leave blank columns in the table 23
Excel Basics
Once a database is created you can use Excel tools to manage the data Sorting Data
Filtering Data
24
Missing Values
Should be entered consistently use „9‟
or „99‟ or „999‟
The value should be something that cannot represent a real numeric value for the variable in question
Excel will recognize these „missing‟ values as real values so be careful if you are using Excel for analysis
25
Additional Points
Ensure rows below data are not „activated‟ so they are not mistaken during transfer as additional cases/observations
Numeric values are always best to use for data entry regardless of the type of variable (quantitative vs. qualitative) Values/labels can always be assigned in a code
book or data analysis program
26
Descriptive vs. Inferential
• Descriptive Statistics • Used to summarize, organize, and simplify data for better understanding
• Means, standard deviations, percents, frequencies, proportions, etc.
• Inferential Statistics • Statistical procedures that allow researchers to study samples & then make generalizations about the population from which they were selected
• Allows the researcher to draw conclusions
28
Descriptive Statistics in Excel
29
This is the status bar. It will display various information about a selected set of values in the spreadsheet. To change the information displayed you simply right click on the status bar.
Central Tendency in Excel
The AVERAGE Function Calculates the arithmetic mean
=AVERAGE(A1:A100)
The MEDIAN Function Calculates the median (center value)
=MEDIAN(A1:A100)
The MODE Function Calculates the most frequently occurring value
=MODE(A1:A100)
30
Variability in Excel
There is no range function in Excel, but… =MAX(A1:A100) – MIN(A1:A100)
The VAR Function Calculates sample variance
=VAR(A1:A100)
The STDEV Function Calculates sample standard deviation
=STDEV(A1:A100)
Remember: STDEV = sqrt(VAR); STDEV2 = VAR
31
Parametric Statistics
A class of inferential statistical tests that involves: assumptions about the distribution of the variables,
the estimation of a parameter, and usually
the use of interval or ratio measures
Statistical tests designed to be used when data have certain characteristics – when they approximate a normal distribution & are measured with interval or ratio scales
33
Parametric Statistics
Bivariate
One-sample t test
Two-sample t test
Analysis of variance (ANOVA)
Repeated measures ANOVA
Pearson‟s product moment correlation (r)
Multivariate
Multiple correlation/regression
ANCOVA
MANOVA
MANCOVA
Mixed design RM-ANOVA
Canonical analysis
Discriminant analysis
Logistic regression
Factor analysis 34
Nonparametric Statistics
A general class of inferential statistical tests that does not involve rigorous assumptions about the distribution of the variables; most often used with small samples, when data are measured on the nominal or ordinal scales, or when a distribution is severely skewed
Statistical tests that are designed to be used when data being analyzed depart from the distributions that can be analyzed with parametric statistics 35
Nonparametric Statistics
Chi-square goodness-of-fit test
Chi-square test of independence
Fisher‟s exact test
McNemar test
Cochran‟s Q test
Mann-Whitney U test
Kruskal-Wallis test
Wicoxon signed ranks test
Friedman test
Spearman‟s rank order correlation
Kendall‟s tau
36
Comparison of Parametric & Nonparametric Statistics
There is at least one nonparametric test equivalent to a parametric test
These tests fall into several categories 1. Tests of differences between groups
(independent samples)
2. Tests of differences between variables (dependent samples)
3. Tests of relationships between variables
37
Differences Between Independent Groups
Two groups/samples – compare mean value for some variable of interest
Multiple groups
Parametric Nonparametric
t-test for independent samples
Wald-Wolfowitz runs test
Mann Whitney U test
Kolmogorov-Smirnov two sample test
38
Parametric Nonparametric
Analysis of Variance (ANOVA/MANOVA)
Kruskal-Wallis analysis of ranks
Median test
Differences Between Dependent Groups
Compare two variables measured in the same sample
If more than two variables are measured in same sample
39
Parametric Nonparametric
t-test for dependent samples
Sign test
Wilcoxon‟s matched pairs test
Parametric Nonparametric
Repeated measures ANOVA
Friedman‟s two way analysis of variance
Cochran Q
Relationships Between Variables
Two variables of interest are categorical
40
Parametric Nonparametric
Correlation coefficient
Spearman R
Kendal Tau
Coefficient Gamma
Chi-Square
Phi coefficient
Fisher exact test
Kendall coefficient of concordance
Parametric vs. Nonparametric
41
Parametric Nonparametric
Assumed Distribution Normal Any
Assumed Variance Homogenous Any
Typical Data Ratio or Interval Ordinal or Nominal
Data Set Relationships Independent Any
Usual Central Measure Mean Median
Benefits Can draw more conclusions
Simplicity; Less affected by outliers
43
More Statistics Using Excel
To get more statistics power from Excel, you need to add in the Analysis ToolPak
Refer to the screenshots on the next few pages
I followed this process in my version of Excel 2007 and had no trouble adding the ToolPak
Analysis ToolPak
Allows you to conduct: Summary descriptive statistics
Correlation
Histograms
Rank & Percentile
Regression
z-tests
t-tests
ANOVAs
44
47
Add in the Analysis ToolPak
Click the checkbox for the Analysis ToolPak, then „OK‟
Install it if it is not installed
When you have added it in, it will appear on the „Data‟ tab all the way on the right hand side of your screen
Questions?
Jenny Holcombe, PhD
UTC School of Nursing
Jenny-Holcombe@utc.edu
(423) 425-5542
48
top related