hrp 223 - 2008

58
HRP223 2008 Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law. HRP 223 - 2008 Topic 8 – Analysis of Means

Upload: nona

Post on 24-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

HRP 223 - 2008. Topic 8 – Analysis of Means. One Categorical Predictor. Multiple Categorical Predictors. Unpaired samples ANOVA Paired samples Mixed Effects Models If data is not normally distributed There are spcialized statistics (Friedman’s test for 2 predictors). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HRP 223 - 2008

HRP223 2008

Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international

treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to

maximum extent possible under the law.

HRP 223 - 2008

Topic 8 – Analysis of Means

Page 2: HRP 223 - 2008

HRP223 2008Normally Distributed Not Normally Distributed

One sample vs. population One sample t-test Wilcoxon Signed Rank

Two paired samples Paired t-test Difference then Signed Rank

Two unpaired samples T-test Wilcoxon Rank-sum

Three or more unpaired samples ANOVA Kruskal-Wallis

Three or more paired samples Mixed effects Transform then mixed model

Normally Distributed Not Normally Distributed

One sample vs. population Describe > Distribution Describe > Distribution

Two paired samples Analyze >ANOVA>t-test Describe > Distribution

Two unpaired samples Analyze >ANOVA>t-test Describe > Distribution

Three or more unpaired samples Analyze >ANOVA>Linear Analyze >ANOVA>Nonpar.

Three or more paired samples Analyze >ANOVA>Mixed

One Categorical Predictor

Page 3: HRP 223 - 2008

HRP223 2008

Multiple Categorical Predictors

Unpaired samples– ANOVA

Paired samples– Mixed Effects Models

If data is not normally distributed– There are spcialized statistics (Friedman’s test for 2

predictors).– Try to transform into normally distributed.

Page 4: HRP 223 - 2008

Mean vs. Expected BMI

It would be nice to see the actual Excel file.

Page 5: HRP 223 - 2008

Adds a link to source

Adds a link to source and runs import wizard

This gives instant access to the current state of the spreadsheet

but it is bugged if you mix character and numeric data.

Page 6: HRP 223 - 2008

HRP223 2008

Take a Look at the Data

Page 7: HRP 223 - 2008

Prior to analysis, do all 3 plots.

Histograms and box plots show

outliers and bimodal data but are not ideal for

assessing normality.

The formal tests for normality are not great. They will not

find problems with small samples and will declare

problems with large samples.

Page 8: HRP 223 - 2008
Page 9: HRP 223 - 2008

......... .......

..........

..........

....

..............

.

. ..

.

....................

.....

......

...............

...

...

....

...

..............

........ .......... ..... ......... .................

.

...

...................

........... .......... ........

..

....

.........

...........

................... ..

.

.

..

..

...... ..

..

.

......

...

.

........ .. . .

...1. 2. 3.

4. 5.1. normal distribution 2. skewed-to-the-right distribution3. skewed-to-the-left distribution4. heavy-tailed distribution5. light-tailed distribution

Image from: Statistics I: Introduction to ANOVA, Regression, and Logistic Regression: Course Notes. SAS Press 2008.

Page 10: HRP 223 - 2008

HRP223 2008

Inference 101

You only have one sample but you want to make inferences to the world.

Given what you see in this sample, you can guess what the distribution of samples looks like around the null distribution.

Page 11: HRP 223 - 2008

4.26

Freq

uenc

y

1 2 3 4 5 6 7

05

1015

4.06

Freq

uenc

y

1 2 3 4 5 6 7

02

46

812

4.13

Freq

uenc

y

1 2 3 4 5 6 7

05

1015

3.8

Freq

uenc

y

1 2 3 4 5 6 7

05

1015

3.84

Freq

uenc

y

1 2 3 4 5 6 7

05

1015

3.9

Freq

uenc

y

1 2 3 4 5 6 7

02

46

810

3.78

Freq

uenc

y

1 2 3 4 5 6 7

02

46

810

4.13

Freq

uenc

y

1 2 3 4 5 6 7

05

1015

4.01

Freq

uenc

y

1 2 3 4 5 6 7

02

46

812

If the population you are sampling from has a mean of 4,

you will not observe a score of 4.

How do you compare this sample

vs. another with a mean of 5?Make a histogram of the means

Page 12: HRP 223 - 2008

.75/sqrt(1) = .75

.75/sqrt(5) = .34

.75/sqrt(25) = .15

1000 samples of size 1

Mean: 4.04 SD: 0.74

Freq

uenc

y

1 2 3 4 5 6 7

040

8012

0

1000 samples of size 5

Mean: 3.99 SD: 0.34

Freq

uenc

y

1 2 3 4 5 6 7

050

150

250

1000 samples of size 25

Mean: 4 SD: 0.15

Freq

uenc

y

1 2 3 4 5 6 7

010

030

0

Dis

tribu

tions

of t

he M

eans

Page 13: HRP 223 - 2008

HRP223 2008

Precision Think of the “+/- something” imprecision in the estimates of

the political polls. You typically end up saying you are 95% sure you chose an

interval that has the true value inside the range bracketed by the confidence limits (CLs). Either the population value is or is not in the interval between the lower and upper confidence limit, and if you repeated the process on many samples, 95% of such intervals would include the population value.

The 99% CI is wider (more accurate) than the more precise 95% CL.

Page 14: HRP 223 - 2008

Confidence Intervals from 10 Samples

Axis with units showing your outcome

You want to set the width of the interval so that in 95% of the

experiments, the confidence interval includes the true value.

In theory, you tweak the interval and increase or decrease the width.

The unobservable truth

Page 15: HRP 223 - 2008

HRP223 2008

Benefits of CLs You have information about the estimate's precision. The width of the CI tells you about the degree of

random error which is set by the confidence interval. Wide intervals indicate poor precision. Plausible

values could be across a broad range.

Page 16: HRP 223 - 2008

HRP223 2008

Estimation vs. Hypothesis Testing

P-value < .05 corresponds to a 95% CL that does not include the null hypothesis value.

CLs show uncertainty, or lack of precision, in the estimate of interest and thus convey more useful information than the p-value.

Page 17: HRP 223 - 2008

CLs vs. p-valuesnull value

Upper CL

LowerCL

Confidenceinterval

P > .05 and the null value is inside of the confidence limits (CLs)

null value

Upper CL

LowerCL

Confidenceinterval

P < .05 and the null value is not inside of the confidence

limits

0 difference between groups or odds ratio of 1

Page 18: HRP 223 - 2008

null value

zone of clinical indifference

Not statistically significant and not clinically interesting

Not statistically significant, possibly clinically interesting

Statistically significant but not clinically interesting

Statistically significant and clinically interesting

Page 19: HRP 223 - 2008
Page 20: HRP 223 - 2008

HRP223 2008

Compare Two Teachers

Import the data Describe the data

– Assign the method as a classification variable Do an unpaired T-test

Do a one-way ANOVA with the predictor having only two levels

Page 21: HRP 223 - 2008
Page 22: HRP 223 - 2008
Page 23: HRP 223 - 2008
Page 24: HRP 223 - 2008
Page 25: HRP 223 - 2008

SS total is the sum of the distance between each point and the overall

mean line squared.

SS error is the sum of the total squared distances between each point and the group mean lines.

Page 26: HRP 223 - 2008

HRP223 2008

Page 27: HRP 223 - 2008
Page 28: HRP 223 - 2008
Page 29: HRP 223 - 2008

Psoriasis Scores are arbitrary numbers 0 = < 0%

response, 5 = 26-50% response, etc.

Page 30: HRP 223 - 2008
Page 31: HRP 223 - 2008
Page 32: HRP 223 - 2008

HRP223 2008

If and only if you work on a fast machine!

Page 33: HRP 223 - 2008

HRP223 2008

Weight Gain

Page 34: HRP 223 - 2008
Page 35: HRP 223 - 2008
Page 36: HRP 223 - 2008
Page 37: HRP 223 - 2008
Page 38: HRP 223 - 2008
Page 39: HRP 223 - 2008
Page 40: HRP 223 - 2008
Page 41: HRP 223 - 2008
Page 42: HRP 223 - 2008

Postpartum Depression

Page 43: HRP 223 - 2008

After the Formats

Page 44: HRP 223 - 2008
Page 45: HRP 223 - 2008
Page 46: HRP 223 - 2008
Page 47: HRP 223 - 2008
Page 48: HRP 223 - 2008
Page 49: HRP 223 - 2008

Dementia

Page 50: HRP 223 - 2008
Page 51: HRP 223 - 2008
Page 52: HRP 223 - 2008
Page 53: HRP 223 - 2008
Page 54: HRP 223 - 2008

HRP223 2008

Wide to Long You may have noticed that the data for these

analyses are all set up as long, skinny files where there is a record for every observation on a patient. Some people store data as wide records with many variables with a single record for each person.

To convert from wide to long:– Do data step processing with arrays. – Use the Transpose option on the Data menu.– Combine proc transpose and data step code.– Use a macro I wrote. (It is brand new, so check it.)

Page 55: HRP 223 - 2008

Tolerance.sas7bdat is dataset from the book

Save a copy of the macro in a file after the fill in the blanks are done.

Page 56: HRP 223 - 2008

The stuff in the blah.sas file:

The stuff in the macro file:

Page 57: HRP 223 - 2008

tol: all variables starting with the letters tol

Page 58: HRP 223 - 2008

HRP223 2008

Narrow to Wide

Of course you can transpose back to wide from narrow.

If you download the keyboard macros today you will see that proc transpose now gives you a code template.