statistical package usage

26
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.

Upload: marva

Post on 06-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven. Statistical Package Usage. Topic: One Way ANOVA By Dr. Kelly Fan, Cal State Univ, East Bay. Statistical Tools vs. Variable Types. Example: Broker Study. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistical Package Usage

Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.

Page 2: Statistical Package Usage

Statistical Package Usage

Topic: One Way ANOVA

By Dr. Kelly Fan, Cal State Univ, East Bay

Page 3: Statistical Package Usage

Statistical Tools vs. Variable Types

Response (output)

Predictor (input)

Numerical Categorical/Mixed

Numerical

Simple and Multiple Regression

Analysis of Variance (ANOVA)Analysis of Covariance (ANCOVA)

Categorical Categorical data analysis

Page 4: Statistical Package Usage

Example: Broker Study

A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used.

Y=1000(A-P)/AwhereP=per share price paid for the stock;A=average of high price and low price per share, for the day.

“The higher Y is the better the trade is.”

Page 5: Statistical Package Usage

}1

1235-112

5 6

27

1713117

17 12

381743

7 5

524131418141917

R=6

CoL: broker

421101512206

14

Five brokers were in the study and six trades were randomly assigned to each broker.

Page 6: Statistical Package Usage

Statistical Model

“LEVEL” OF BROKER(Broker is, of course, represented as “categorical”)

Y11 Y12 • • • • • • •Y1c

Yij

Y21

YnI

1

2

n

1 2 • • •  •  •  • • • C

Yij = j + ij

i = 1, . . . . . , C

j = 1, . . . . . , n

Ync•   •  •   •    •   •    •    • 

Page 7: Statistical Package Usage

One-Way Anova F-Test:

HO: Level of X has no impact on Y

HI: Level of X does have impact on Y

HO: 1 = 2 = • • • • 8

HI: not all j are EQUAL

Page 8: Statistical Package Usage

ONE WAY ANOVA

Estimate of the common standard deviation

The GLM Procedure

Dependent Variable: TRADE

Sum of Source DF Squares Mean Square F Value Pr > F

Model 4 640.800000 160.200000 7.56 0.0004

Error 25 530.000000 21.200000

Corrected Total 29 1170.800000

R-Square Coeff Var Root MSE TRADE Mean

0.547318 42.63283 4.604346 10.80000

Page 9: Statistical Package Usage

Diagnosis: Normality

• Don’t do the normality checking for all groups but only for the residuals

• The points on the normality plot must more or less follow a line to claim “normal distributed”.

• There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not

sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much.

Normality plot: normal scores vs. residuals

Page 10: Statistical Package Usage

From the Broker data:

- 3 - 2 - 1 0 1 2 3

- 10. 0

- 7. 5

- 5. 0

- 2. 5

0

2. 5

5. 0

7. 5

RESIDUAL

Nor mal Quant i l es

Page 11: Statistical Package Usage

Diagnosis: Equal Variances

• The points on the residual plot must be more or less within a horizontal band to claim “constant variances”.

• There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not sensitive

to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much.

Residual plot: predicted values vs. residuals

Page 12: Statistical Package Usage

From the Broker data:

RESI DUAL

- 8

- 7

- 6

- 5

- 4

- 3

- 2

- 1

0

1

2

3

4

5

6

7

PREDI CTED

5 6 7 8 9 10 11 12 13 14 15 16 17

Page 13: Statistical Package Usage

Multiple ComparisonProcedures

Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc.

Page 14: Statistical Package Usage

Pairwise Comparison

Goal: grouping levels

Method: Compare each pair of levels

SNK procedure is a popular procedure and introduced here

Page 15: Statistical Package Usage

SAS Output for SNK Procedure

Number of Means 2 3 4 5

Critical Range 5.4749249 6.6214244 7.3120942 7.8071501

Means with the same letter are not significantly different.

SNK Grouping Mean N BROKER

A 17.000 6 5

A

A 14.000 6 4

A

A 12.000 6 2

B 6.000 6 1

B

B 5.000 6 3

Page 16: Statistical Package Usage

Conclusion : 5 4 2 1 3

Page 17: Statistical Package Usage

Brokers 1 and 3 are not significantly different each other but they are significantly different to the other 3 brokers.

Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly.

Conclusion : 5 4 2 1 3

Page 18: Statistical Package Usage

Comparisons to Control Dunnett Procedure

Designed specifically for comparing several “treatments” to a “control.”

Example: 1 2 3 4 5

6 12 5 14 17

Col

} R=6CONTROL

Page 19: Statistical Package Usage

- Cols 4 and 5 differ from the control [ 1 ].- Cols 2 and 3 are not significantly differentfrom control.

In our example: 1 2 3 4 5 6 12 5 14 17

CONTROL

Comparisons significant at the 0.05 level are indicated by ***.

BROKER Comparison

Difference Between

Means

Simultaneous 95%

Confidence Limits

5 - 1 11.000 4.070 17.930 ***

4 - 1 8.000 1.070 14.930 ***

2 - 1 6.000 -0.930 12.930

3 - 1 -1.000 -7.930 5.930

Page 20: Statistical Package Usage

Contrast

Question 1: Broker 1 vs. the others

Question 2: Brokers 1, 2 are more experienced than the others.

Experienced vs. less experienced brokers

Page 21: Statistical Package Usage

SAS Output for Question 1

Contrast DF Contrast SS Mean Square F Value Pr > F

BROKER 1 VS THE OTHERS 1 172.8000000 172.8000000 8.15 0.0085

Page 22: Statistical Package Usage

KRUSKAL - WALLIS TEST

(Non - Parametric Alternative)

HO: The probability distributions are identical for each level of the factor

HI: Not all the distributions are the same

Page 23: Statistical Package Usage

Example: Life Insurance Amount

State

1: CA 2: KA 3: CO

90 80 165

200 140 160

225 150 140

100 140 160

170 150 175

300 300 155

250 280 180

Page 24: Statistical Package Usage

RESI DUAL

- 200

- 100

0

100

200

PREDI CTED

160 170 180 190 200

Page 25: Statistical Package Usage

KRUSKAL - WALLIS TEST

Kruskal-Wallis Test

Chi-Square 1.0791

DF 2

Pr > Chi-Square 0.5830

Page 26: Statistical Package Usage

SAS Code

DATA INSURANCE;INPUT STATE $ AMOUNT@@;

DATALINES;CA 90 CA 200 CA 225 CA 100 CA 170 CA 300 CA 250KA 80 KA 140 KA 150 KA 140 KA 150 KA 300 KA 280CO 165 CO 160 CO 140 CO 160 CO 175 CO 155 CO 180;

** NON-PARAMETRIC TEST;PROC NPAR1WAY DATA=INSURANCE WILCOXON;

TITLE "NONPARAMETRIC TEST TO COMPARE STATES";CLASS STATE;VAR AMOUNT;

RUN;