hrp 223 - 2008
DESCRIPTION
HRP 223 - 2008. Topic 8 – Analysis of Means. One Categorical Predictor. Multiple Categorical Predictors. Unpaired samples ANOVA Paired samples Mixed Effects Models If data is not normally distributed There are spcialized statistics (Friedman’s test for 2 predictors). - PowerPoint PPT PresentationTRANSCRIPT
HRP223 2008
Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international
treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to
maximum extent possible under the law.
HRP 223 - 2008
Topic 8 – Analysis of Means
HRP223 2008Normally Distributed Not Normally Distributed
One sample vs. population One sample t-test Wilcoxon Signed Rank
Two paired samples Paired t-test Difference then Signed Rank
Two unpaired samples T-test Wilcoxon Rank-sum
Three or more unpaired samples ANOVA Kruskal-Wallis
Three or more paired samples Mixed effects Transform then mixed model
Normally Distributed Not Normally Distributed
One sample vs. population Describe > Distribution Describe > Distribution
Two paired samples Analyze >ANOVA>t-test Describe > Distribution
Two unpaired samples Analyze >ANOVA>t-test Describe > Distribution
Three or more unpaired samples Analyze >ANOVA>Linear Analyze >ANOVA>Nonpar.
Three or more paired samples Analyze >ANOVA>Mixed
One Categorical Predictor
HRP223 2008
Multiple Categorical Predictors
Unpaired samples– ANOVA
Paired samples– Mixed Effects Models
If data is not normally distributed– There are spcialized statistics (Friedman’s test for 2
predictors).– Try to transform into normally distributed.
Mean vs. Expected BMI
It would be nice to see the actual Excel file.
Adds a link to source
Adds a link to source and runs import wizard
This gives instant access to the current state of the spreadsheet
but it is bugged if you mix character and numeric data.
HRP223 2008
Take a Look at the Data
Prior to analysis, do all 3 plots.
Histograms and box plots show
outliers and bimodal data but are not ideal for
assessing normality.
The formal tests for normality are not great. They will not
find problems with small samples and will declare
problems with large samples.
......... .......
..........
..........
....
..............
.
. ..
.
....................
.....
......
...............
...
...
....
...
..............
........ .......... ..... ......... .................
.
...
...................
........... .......... ........
..
....
.........
...........
................... ..
.
.
..
..
...... ..
..
.
......
...
.
........ .. . .
...1. 2. 3.
4. 5.1. normal distribution 2. skewed-to-the-right distribution3. skewed-to-the-left distribution4. heavy-tailed distribution5. light-tailed distribution
Image from: Statistics I: Introduction to ANOVA, Regression, and Logistic Regression: Course Notes. SAS Press 2008.
HRP223 2008
Inference 101
You only have one sample but you want to make inferences to the world.
Given what you see in this sample, you can guess what the distribution of samples looks like around the null distribution.
4.26
Freq
uenc
y
1 2 3 4 5 6 7
05
1015
4.06
Freq
uenc
y
1 2 3 4 5 6 7
02
46
812
4.13
Freq
uenc
y
1 2 3 4 5 6 7
05
1015
3.8
Freq
uenc
y
1 2 3 4 5 6 7
05
1015
3.84
Freq
uenc
y
1 2 3 4 5 6 7
05
1015
3.9
Freq
uenc
y
1 2 3 4 5 6 7
02
46
810
3.78
Freq
uenc
y
1 2 3 4 5 6 7
02
46
810
4.13
Freq
uenc
y
1 2 3 4 5 6 7
05
1015
4.01
Freq
uenc
y
1 2 3 4 5 6 7
02
46
812
If the population you are sampling from has a mean of 4,
you will not observe a score of 4.
How do you compare this sample
vs. another with a mean of 5?Make a histogram of the means
.75/sqrt(1) = .75
.75/sqrt(5) = .34
.75/sqrt(25) = .15
1000 samples of size 1
Mean: 4.04 SD: 0.74
Freq
uenc
y
1 2 3 4 5 6 7
040
8012
0
1000 samples of size 5
Mean: 3.99 SD: 0.34
Freq
uenc
y
1 2 3 4 5 6 7
050
150
250
1000 samples of size 25
Mean: 4 SD: 0.15
Freq
uenc
y
1 2 3 4 5 6 7
010
030
0
Dis
tribu
tions
of t
he M
eans
HRP223 2008
Precision Think of the “+/- something” imprecision in the estimates of
the political polls. You typically end up saying you are 95% sure you chose an
interval that has the true value inside the range bracketed by the confidence limits (CLs). Either the population value is or is not in the interval between the lower and upper confidence limit, and if you repeated the process on many samples, 95% of such intervals would include the population value.
The 99% CI is wider (more accurate) than the more precise 95% CL.
Confidence Intervals from 10 Samples
Axis with units showing your outcome
You want to set the width of the interval so that in 95% of the
experiments, the confidence interval includes the true value.
In theory, you tweak the interval and increase or decrease the width.
The unobservable truth
HRP223 2008
Benefits of CLs You have information about the estimate's precision. The width of the CI tells you about the degree of
random error which is set by the confidence interval. Wide intervals indicate poor precision. Plausible
values could be across a broad range.
HRP223 2008
Estimation vs. Hypothesis Testing
P-value < .05 corresponds to a 95% CL that does not include the null hypothesis value.
CLs show uncertainty, or lack of precision, in the estimate of interest and thus convey more useful information than the p-value.
CLs vs. p-valuesnull value
Upper CL
LowerCL
Confidenceinterval
P > .05 and the null value is inside of the confidence limits (CLs)
null value
Upper CL
LowerCL
Confidenceinterval
P < .05 and the null value is not inside of the confidence
limits
0 difference between groups or odds ratio of 1
null value
zone of clinical indifference
Not statistically significant and not clinically interesting
Not statistically significant, possibly clinically interesting
Statistically significant but not clinically interesting
Statistically significant and clinically interesting
HRP223 2008
Compare Two Teachers
Import the data Describe the data
– Assign the method as a classification variable Do an unpaired T-test
Do a one-way ANOVA with the predictor having only two levels
SS total is the sum of the distance between each point and the overall
mean line squared.
SS error is the sum of the total squared distances between each point and the group mean lines.
HRP223 2008
Psoriasis Scores are arbitrary numbers 0 = < 0%
response, 5 = 26-50% response, etc.
HRP223 2008
If and only if you work on a fast machine!
HRP223 2008
Weight Gain
Postpartum Depression
After the Formats
Dementia
HRP223 2008
Wide to Long You may have noticed that the data for these
analyses are all set up as long, skinny files where there is a record for every observation on a patient. Some people store data as wide records with many variables with a single record for each person.
To convert from wide to long:– Do data step processing with arrays. – Use the Transpose option on the Data menu.– Combine proc transpose and data step code.– Use a macro I wrote. (It is brand new, so check it.)
Tolerance.sas7bdat is dataset from the book
Save a copy of the macro in a file after the fill in the blanks are done.
The stuff in the blah.sas file:
The stuff in the macro file:
tol: all variables starting with the letters tol
HRP223 2008
Narrow to Wide
Of course you can transpose back to wide from narrow.
If you download the keyboard macros today you will see that proc transpose now gives you a code template.