session 7. applied regression -- prof. juran2 outline chi-square goodness-of-fit tests fit to a...

100
Session 7

Upload: miles-elliott

Post on 22-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Session 7

Page 2: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 2

Outline• Chi-square Goodness-of-Fit Tests

• Fit to a Normal• Simulation Modeling

• Autocorrelation, serial correlation• Runs test• Durbin-Watson

• Model Building• Variable Selection Methods• Minitab

Page 3: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 3

Goodness-of-Fit Tests

• Determine whether a set of sample data have been drawn from a hypothetical population

• Same four basic steps as other hypothesis tests we have learned

• An important tool for simulation modeling; used in defining random variable inputs

Page 4: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 4

Example: Barkevious Mingo

Financial analyst Barkevious Mingo wants to run a simulation model that includes the assumption that the daily volume of a specific type of futures contract traded at U.S. commodities exchanges (represented by the random variable X) is normally distributed with a mean of 152 million contracts and a standard deviation of 32 million contracts. (This assumption is based on the conclusion of a study conducted in 2013.) Barkevious wants to determine whether this assumption is still valid.

Page 5: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 5

He studies the trading volume of these contracts for 50 days, and observes the following results (in millions of contracts traded):

142.4 207.5 129.9 84.2 149.3 105.8 152.9 141.5 135.6 205.2 111.1 82.1 97.9 133.8 135.2 124.9 141.7 140.2 215.1 100.4 159.8 144.5 92.9 139.1 173.6 103.3 222.2 195.0 179.7 169.2 192.8 187.0 120.7 156.3 139.8 140.4 96.2 149.3 228.0 180.9 190.3 117.2 127.2 140.3 176.2 151.0 128.4 146.0 131.0 213.4

Page 6: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 6

Bin Observed Frequency

z-value at Bin Upper Limit

Area under Standard Normal Curve

Expected Frequency out of 50 Observations

0-25 0 -3.969 0.0000 0.00 25-50 0 -3.188 0.0007 0.03 50-75 0 -2.406 0.0073 0.37 75-100 5 -1.625 0.0440 2.20 100-125 7 -0.844 0.1473 7.37 125-150 19 -0.063 0.2757 13.78 150-175 6 0.719 0.2888 14.44 175-200 7 1.500 0.1693 8.47 200-225 5 2.281 0.0555 2.78 225-250 1 3.063 0.0102 0.51 250-275 0 3.844 0.0010 0.05 275-300 0 4.625 0.0001 0.00 300-325 0 5.406 0.0000 0.00

Page 7: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 7

Here is a histogram showing the theoretical distribution of 50 observations drawn from a normal distribution with μ = 152 and σ = 32, together with a histogram of Mingo’s sample data:

"Eyeball" Hypothesis Test: Expected Distribution

0

5

10

15

20

0-25 25-50 50-75 75-100 100-125 125-150 150-175 175-200 200-225 225-250 250-275 275-300 300-325

Number of Contracts Traded

Fr

eq

ue

nc

y

"Eyeball" Hypothesis Test: Observed Distribution

0

5

10

15

20

0-25 25-50 50-75 75-100 100-125 125-150 150-175 175-200 200-225 225-250 250-275 275-300 300-325

Number of Contracts Traded

Fr

eq

ue

nc

y

Page 8: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 8

2

e

eo

f

ff 2

of = the observed frequency of data in a specific range

ef = the expected frequency of data in a specific range

The Chi-Square Statistic

Page 9: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 9

Essentially, this statistic allows us to compare the distribution of a sample with some expected distribution, in standardized terms. It is a measure of how much a sample differs from some proposed distribution.

A large value of chi-square suggests that the two distributions are not very similar; a small value suggests that they “fit” each other quite well.

Page 10: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 10

Like Student’s t, the distribution of chi-square depends on degrees of freedom.

In the case of chi-square, the number of degrees of freedom is equal to the number of classes (a.k.a. “bins” into which the data have been grouped) minus one, minus the number of estimated parameters.

Page 11: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 11

H ere are grap h s sh ow in g th e ch i -sq u are d istrib u tion for sev eral d iff eren t n u m b ers of d egrees of f reed om :

0 .0 0 0

0 .0 0 5

0 .0 1 0

0 .0 1 5

0 .0 2 0

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0

C h i - S q u a r e S t a t i s t i c

Proba

bility

0 .0 0 0

0 .0 0 5

0 .0 1 0

0 .0 1 5

0 .0 2 0

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0

C h i - S q u a r e S t a t i s t i c

Proba

bility

C h i -S q u are D i strib u tion , d.f. = 5 C h i -S q u are D i strib u tion , d.f. = 10

0 .0 0 0

0 .0 0 5

0 .0 1 0

0 .0 1 5

0 .0 2 0

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0

C h i - S q u a r e S t a t i s t i c

Proba

bility

0 .0 0 0

0 .0 0 5

0 .0 1 0

0 .0 1 5

0 .0 2 0

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0

C h i - S q u a r e S t a t i s t i c

Proba

bility

C h i -S q u are D i strib u tion , d.f. = 15 C h i -S q u are D i strib u tion , d.f. = 20

Page 12: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 12

Bin Observed Frequency

z-value at Bin Upper Limit

Area under Standard Normal Curve

Expected Frequency out of 50 Observations

0-25 0 -3.969 0.0000 0.00 25-50 0 -3.188 0.0007 0.03 50-75 0 -2.406 0.0073 0.37 75-100 5 -1.625 0.0440 2.20 100-125 7 -0.844 0.1473 7.37 125-150 19 -0.063 0.2757 13.78 150-175 6 0.719 0.2888 14.44 175-200 7 1.500 0.1693 8.47 200-225 5 2.281 0.0555 2.78 225-250 1 3.063 0.0102 0.51 250-275 0 3.844 0.0010 0.05 275-300 0 4.625 0.0001 0.00 300-325 0 5.406 0.0000 0.00

Page 13: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 13

Note: It is necessary to have a sufficiently large sample so that each class has an expected frequency of at least 5. We need to make sure that the expected frequency in each bin is at least 5, so we “collapse” some of the bins, as shown here.

Bin Observed Frequency

z-value at Bin Upper

Limit

Area under Standard

Normal Curve

Expected Frequency out of 50

Observations 0-125 12 -0.844 0.1994 9.97

125-150 19 -0.063 0.2757 13.78 150-175 6 0.719 0.2888 14.44 175-325 13 5.406 0.2361 11.81

Page 14: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 14

The number of degrees of freedom is equal to the number of bins minus one, minus the number of estimated parameters. We have not estimated any parameters, so we have d.f. = 4 – 1 – 0 = 3.

The critical chi-square value can be found either by using a chi-square table or by using the Excel function:

=CHIINV(alpha, d.f.) = CHIINV(0.05, 3) = 7.815

We will reject the null hypothesis if the test statistic is greater than 7.815.

Page 15: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 15

Bin Observed Frequency Expected Frequency out of 50 Observations e

eo

f

ff 2

0-125 12 9.97 0.413 125-150 19 13.78 1.974 150-175 6 14.44 4.932 175-325 13 11.81 0.120 Chi-Square = 7.439

Our test statistic is not greater than the critical value; we cannot reject the null hypothesis at the 0.05 level of significance.

It would appear that Barkevious is justified in using the normal distribution with μ = 152 and σ = 32 to model futures contract trading volume in his simulation.

Page 16: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 16

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0 2 4 6 8 10 12 14 16 18 20

Critical Value of Chi-Square = 7.815Test Statistic = 7.439

The p-value of this test has the same interpretation as in any other hypothesis test, namely that it is the smallest level of alpha at which H0 could be rejected. In this case, we calculate the p-value using the Excel function:

= CHIDIST(test stat, d.f.) = CHIDIST(7.439,3) = 0.0591

Page 17: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 17

Example: Catalog Company

If we want to simulate the queueing system at this company, what distributions should we use for the arrival and service processes?

Page 18: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 18

Arrivals2425262728293031323334353637383940414243444546

H I J K L M N O PObserved Expected

0-2 0 20 58.43 80.00 21.572-4 2 17 42.68 58.43 15.754-6 4 10 31.18 42.68 11.516-8 6 12 22.77 31.18 8.408-10 8 2 16.63 22.77 6.1410-12 10 6 12.15 16.63 4.4812-14 12 5 8.87 12.15 3.2814-16 14 1 6.48 8.87 2.3916-18 16 2 4.73 6.48 1.7518-20 18 2 3.46 4.73 1.2820-22 20 1 2.53 3.46 0.9322-24 22 1 1.84 2.53 0.6824-26 24 0 1.35 1.84 0.5026-28 26 0 0.98 1.35 0.3628-30 28 0 0.72 0.98 0.2730-32 30 1 0.53 0.72 0.1932-34 32 0 0.38 0.53 0.1434-36 34 0 0.28 0.38 0.1036-38 36 0 0.20 0.28 0.0838-40 38 0 0.15 0.20 0.0640-42 40 0 0.11 0.15 0.04

42

=80*EXP(-$J$5*G28)=80*EXP(-$J$5*G28)

=M30-L30

Page 19: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 19

Arrival Rate Analysis

0

5

10

15

20

25

0-2 2-4 4-6 6-8 8-10 10-12 12-14 14-16 16-18 18-20 20-22 22-24 24-26 26-28 28-30 30-32

Interarrival Times (Minutes)

Fre

qu

ency

(80

Arr

ival

s)

Observed

Expected

Page 20: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 20

25262728293031323334

Q R S T U V W X Y ZObserved Expected

0-2 20 58.43374 80 21.56626 0.1137512-4 17 42.68127 58.43374 15.75247 0.09884-6 10 31.17533 42.68127 11.50594 0.1971046-8 12 22.77113 31.17533 8.404191 1.5384998-10 2 16.63253 22.77113 6.138604 2.79021810-14 11 12.14876 16.63253 7.758812 1.35398314-32 8 6.481557 8.873719 8.76454 0.066692

6.159046

=(R26-V26)^2/V26

=SUM(W26:W32)

Page 21: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 21

2526272829303132333435363738

Q R S T U V WObserved Expected

0-2 20 58.43374 80 21.56626 0.1137512-4 17 42.68127 58.43374 15.75247 0.09884-6 10 31.17533 42.68127 11.50594 0.1971046-8 12 22.77113 31.17533 8.404191 1.5384998-10 2 16.63253 22.77113 6.138604 2.79021810-14 11 12.14876 16.63253 7.758812 1.35398314-32 8 6.481557 8.873719 8.76454 0.066692

6.159046d.f. 6alpha 0.05critical value 12.5916test stat 6.1590p-value 0.4056

=CHIINV(S35,S34)=W33=CHIDIST(S37,S34)

Page 22: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 22

Goodness of Fit Test for Arrivals

0.0000

0.0020

0.0040

0.0060

0.0080

0.0100

0.0120

0.0140

0.0160

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Chi Square

Pro

bab

ilit

y

Test Statistic = 6.159

Critical Value = 12.59

Area Under the Curve > 6.159 = 0.4056

Page 23: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 23

ServicesService Rate Analysis

0

2

4

6

8

10

12

14

16

18

0-2 2-4 4-6 6-8 8-10 10-12 12-14 14-16 16-18 18-20 20-22 22-24 24-26 26-28 28-30 30-32 32-34 34-36 36-38 38-40 40-42

Interarrival Times (Minutes)

Fre

qu

ency

(80

Arr

ival

s)

Observed

Expected

Page 24: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 24

Goodness of Fit Test for Services

0.0000

0.0020

0.0040

0.0060

0.0080

0.0100

0.0120

0.0140

0.0160

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Chi Square

Pro

bab

ilit

y

Test Statistic = 47.79

Critical Value = 11.07

Area Under the Curve > 47.79 = 0.0000

Page 25: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

142.4 207.5 129.9 84.2 149.3 105.8 152.9 141.5 135.6 205.2

111.1 82.1 97.9 133.8 135.2 124.9 141.7 140.2 215.1 100.4

159.8 144.5 92.9 139.1 173.6 103.3 222.2 195.0 179.7 169.2

192.8 187.0 120.7 156.3 139.8 140.4 96.2 149.3 228.0 180.9

190.3 117.2 127.2 140.3 176.2 151.0 128.4 146.0 131.0 213.4

Page 26: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran26

Page 27: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran

27

Page 28: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran28

Page 29: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran

29

Page 30: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran 30

Page 31: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran 31

Page 32: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran 32

Page 33: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran 33

Page 34: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran 34

Page 35: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran 35

Page 36: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran 36

Page 37: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Decision Models -- Prof. Juran 37

123456

A B C D E F G H I J142.4

111.1

159.8

192.8

190.3 159.9126207.5

=PsiLogistic(0.0100000000000057,20.5266814579822, PsiShift(148.558))

Page 38: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 38

Other uses for the Chi-Square statistic

• Tests of the independence of two qualitative population variables.

• Tests of the equality or inequality of more than two population proportions.

• Inferences about a population variance, including the estimation of a confidence interval for a population variance from sample data.

The chi-square technique can often be employed for purposes of estimation or hypothesis testing when the z or t statistics are not appropriate. In addition to the goodness-of-fit application described above, there are at least three other important uses for chi-square:

Page 39: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 39

(A.k.a. Autocorrelation)

Are the residuals independent of each other?

What if there’s evidence that sequential residuals have a positive correlation?

Serial Correlation

Page 40: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 40

Expenditure Time Stock Expenditure 1 0.9833 0.9784 Time 0.9833 1 0.9941 Stock 0.9784 0.9941 1

Page 41: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 41

Run Chart of Expenditure

200

210

220

230

240

250

260

270

280

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Time

Ex

pe

nd

itu

re

Page 42: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 42

There seems to be a relationship between each observation and the ones around it. In other words, there is some positive correlation between the observations and their successors. If true, this suggests that a lot of the variability in observation Yi can be explained by observation Yi – 1.

In turn, this might suggest that the importance of Money Stock is being overstated by our original model.

Page 43: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 43

Year Quarter Expenditure Stock Prev. Quarter 1952 2 217.7 161.2 214.6 1952 3 219.6 162.8 217.7 1952 4 227.2 164.6 219.6 1953 1 230.9 165.9 227.2 1953 2 233.3 167.9 230.9 1953 3 234.1 168.3 233.3 1953 4 232.3 169.7 234.1 1954 1 233.7 170.5 232.3 1954 2 236.5 171.6 233.7 1954 3 238.7 173.9 236.5 1954 4 243.2 176.1 238.7 1955 1 249.4 178.0 243.2 1955 2 254.3 179.1 249.4 1955 3 260.9 180.2 254.3 1955 4 263.3 181.2 260.9 1956 1 265.6 181.6 263.3 1956 2 268.2 182.5 265.6 1956 3 270.4 183.3 268.2 1956 4 275.6 184.3 270.4

Page 44: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 44

Regression Statistics Multiple R 0.9938 R Square 0.9876 Adjusted R Square 0.9861 Standard Error 2.1167 Observations 19 ANOVA

df SS MS F Significance F Regression 2 5731.9431 2865.9716 639.6692 0.0000 Residual 16 71.6863 4.4804 Total 18 5803.6295

Coefficients Standard Error t Stat P-value Intercept -39.0201 21.6317 -1.8038 0.0901 Stock 0.5342 0.2764 1.9325 0.0712 Prev. Quarter 0.7906 0.1173 6.7383 0.0000

Page 45: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 45

Histogram of Residuals

0

1

2

3

4

5

6

7

8

-10 -8 -6 -4 -2 0 2 4 6 8 10

Residual Error

Fre

qu

en

cy

Histogram of Residuals

0

2

4

6

8

10

12

-10 -8 -6 -4 -2 0 2 4 6 8 10

Residual Error

Fre

qu

en

cy

Page 46: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 46

Line Fit Plot

200

210

220

230

240

250

260

270

280

200 210 220 230 240 250 260 270 280

Predicted Expenditure

Ex

pe

nd

itu

re

Line Fit Plot

200

210

220

230

240

250

260

270

280

200 210 220 230 240 250 260 270 280

Predicted Expenditure

Ex

pe

nd

itu

re

Page 47: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 47

Normal Probability Plot

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Normal Score

Sta

nd

ard

ize

d R

es

idu

al

Normal Probability Plot

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Normal Score

Sta

nd

ard

ize

d R

es

idu

al

Page 48: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 48

Residual Plot vs. Money Stock

-8

-6

-4

-2

0

2

4

6

8

155 160 165 170 175 180 185 190

Stock

Re

sid

ua

ls

Residual Plot vs. Money Stock

-8

-6

-4

-2

0

2

4

6

8

155 160 165 170 175 180 185 190

Stock

Re

sid

ua

ls

Page 49: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 49

Residual Plot vs. Time

-8

-6

-4

-2

0

2

4

6

8

0 2 4 6 8 10 12 14 16 18 20

Time

Re

sid

ua

ls

Residual Plot vs. Time

-8

-6

-4

-2

0

2

4

6

8

0 2 4 6 8 10 12 14 16 18 20

Time

Re

sid

ua

ls

Page 50: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 50

Regression Statistics Multiple R 0.9924 R Square 0.9848 Adjusted R Square 0.9839 Standard Error 2.2806 Observations 19 ANOVA df SS MS F Significance F Regression 1 5715.2110 5715.2110 1098.8496 0.0000 Residual 17 88.4185 5.2011 Total 18 5803.6295 Coefficients Standard Error t Stat P-value Intercept 0.6257 7.3904 0.0847 0.9335 Prev. Quarter 1.0107 0.0305 33.1489 0.0000

Page 51: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 51

Model 1 Model 2 Model 3

R-sq. 0.9573 0.9876 0.9848

Adj. R-sq. 0.9549 0.9861 0.9839

Std. Err. 3.9827 2.1167 2.2806

Coeff. for Stock 2.3004 0.5342

Coeff. for Prev. Period 0.7906 1.0107

Page 52: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 52

A “run” is when the residual is positive (or negative) consecutively.

Runs Test

++++++++-- has 2 runs

++--++--++ has 5 runs

--+--+--+- has 7 runs, and so forth.

Page 53: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 53

Let n1 be the observed number of positive runs and n2 be the observed number of negative runs. The total number of runs in a set of n uncorrelated residuals can be shown to have a mean of

And a variance of

12

21

21

nnnn

1

22

212

21

2121212

nnnn

nnnnnn

Page 54: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 54

In our Money Stock case, the expected value is 8.1 and the standard deviation ought to be about 1.97.

Residual Plot vs. Time

-8

-6

-4

-2

0

2

4

6

8

0 2 4 6 8 10 12 14 16 18 20

Time

Re

sid

ua

ls

Page 55: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 55

Our Model 1 has 5 runs which is 1.57 standard deviations below the expected value — an unusually small number of runs.

This suggests that the residuals are not independent. (This is an approximation based on the central limit theorem; it doesn’t work well with small samples.)

Our Model 2 has 7 runs; only 0.56 standard deviations below the expected value.

Page 56: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 56

Durbin-WatsonAnother popular hypothesis-testing procedure:

H0: Correlation = 0

HA: Correlation > 0

The test statistic is:

n

tt

n

ttt

e

eed

1

2

2

21

Page 57: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 57

In general,

Values of d close to zero indicate strong positive correlation, and values of d close to 2 suggest weak correlation.

Precise definitions of “close to zero” and “close to 2” depend on the sample size and the number of independent variables; see p. 346 in RABE for a Durbin-Watson table.

12 d

Page 58: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 58

The Durbin-Watson procedure will result in one of three possible decisions:

From the Durbin-Watson table, we see that our Model 1 has upper and lower limits of 1.15 and 0.95, respectively. Model 2 has limits of 1.26 and 0.83.

d < dL Reject H0

dL < d < dU Inconclusive

dU < d Do not reject H0

Page 59: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 59

123456789

101112131415161718192021222324

A B C D E FModel 1Observation Residuals

1 2.870 8.242 1.599 1.61 2.563 -0.181 3.17 0.034 3.278 11.97 10.755 3.988 0.50 15.906 1.787 4.84 3.197 1.667 0.01 2.788 -3.354 25.21 11.259 -3.794 0.19 14.39

10 -3.524 0.07 12.4211 -6.615 9.55 43.7612 -7.176 0.31 51.5013 -5.347 3.35 28.5914 -2.977 5.61 8.8615 1.092 16.56 1.1916 1.192 0.01 1.4217 2.572 1.90 6.6118 3.102 0.28 9.6219 3.461 0.13 11.9820 6.361 8.41 40.46

93.71 285.510.3282 d-stat

=(B5-B4)^2

=B7^2

=C23/D23

Page 60: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 60

2829303132333435363738394041424344454647484950

A B C D EModel 2Observation Residuals

1 0.943 0.892 -0.463 1.98 0.213 4.673 26.38 21.844 1.670 9.02 2.795 0.076 2.54 0.016 -1.235 1.72 1.527 -4.415 10.11 19.498 -2.019 5.74 4.089 -0.914 1.22 0.84

10 -2.156 1.54 4.6511 -0.571 2.51 0.3312 1.056 2.65 1.1213 0.467 0.35 0.2214 2.605 4.57 6.7915 -0.747 11.24 0.5616 -0.558 0.04 0.3117 -0.258 0.09 0.0718 -0.540 0.08 0.2919 2.386 8.56 5.69

90.35 71.691.2603 d-stat

Page 61: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 61

In Model 1, we reject the null hypothesis and conclude there is significant positive correlation between sequential residuals.

In Model 2, we do not reject the null hypothesis; the serial correlation is not significantly greater than zero.

Page 62: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 62

Residual Analysis from the Tool-Wear Model

Histogram of Residuals

0

1

2

3

4

5

6

7

-8 -6 -4 -2 0 2 4 6 8

Residual Error

Fre

qu

en

cy

Page 63: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 63

Normal Plot of Residuals

-4

-3

-2

-1

0

1

2

3

4

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Residual Values

No

rma

l Sc

ore

Page 64: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 64

727374757677787980818283848586878889909192

A B C D E FObservation Rank Predicted Life Residuals Quantile Normal Score

7 1 18.893 -5.553 0.048 -3.49512 2 31.502 -4.412 0.095 -2.76111 3 34.163 -4.003 0.143 -2.38513 4 28.575 -3.175 0.190 -1.9211 5 20.755 -2.025 0.238 -1.37917 6 27.777 -1.707 0.286 -1.12816 7 36.292 -0.672 0.333 -0.6529 8 13.305 -0.625 0.381 -0.5093 9 17.828 -0.398 0.429 -0.3114 10 14.636 -0.096 0.476 -0.0918 11 22.618 0.092 0.524 0.09014 12 25.383 0.667 0.571 0.4006 13 22.884 1.506 0.619 0.79915 14 31.768 1.722 0.667 0.99710 15 17.562 1.758 0.714 1.14418 16 34.695 2.085 0.762 1.3985 17 10.911 2.529 0.810 1.7082 18 11.709 2.811 0.857 1.99219 19 30.438 4.512 0.905 2.79420 20 38.686 4.984 0.952 3.308

=B73/($A$92+1)

Normal score calculations:

Page 65: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 65

T hen, calculate the normal score, w hich is a measure of w here w e w ould expect the quantiles to be if w e drew a sample of this size from a perfect standard normal d istribution:

7 2 7 3 7 4 7 5 7 6 7 7 7 8 7 9 8 0 8 1 8 2 8 3 8 4 8 5 8 6 8 7 8 8 8 9 9 0 9 1 9 2

A B C D E F G H O b se rva tio n R a n k P re d ic te d L ife R e s id u a ls Q u a n tile N o rm a l S co re

7 1 1 8 .8 9 3 -5 .5 5 3 0 .0 4 8 -3 .4 9 5 1 2 2 3 1 .5 0 2 -4 .4 1 2 0 .0 9 5 -2 .7 6 1 1 1 3 3 4 .1 6 3 -4 .0 0 3 0 .1 4 3 -2 .3 8 5 1 3 4 2 8 .5 7 5 -3 .1 7 5 0 .1 9 0 -1 .9 2 1 1 5 2 0 .7 5 5 -2 .0 2 5 0 .2 3 8 -1 .3 7 9

1 7 6 2 7 .7 7 7 -1 .7 0 7 0 .2 8 6 -1 .1 2 8 1 6 7 3 6 .2 9 2 -0 .6 7 2 0 .3 3 3 -0 .6 5 2 9 8 1 3 .3 0 5 -0 .6 2 5 0 .3 8 1 -0 .5 0 9 3 9 1 7 .8 2 8 -0 .3 9 8 0 .4 2 9 -0 .3 1 1 4 1 0 1 4 .6 3 6 -0 .0 9 6 0 .4 7 6 -0 .0 9 1 8 1 1 2 2 .6 1 8 0 .0 9 2 0 .5 2 4 0 .0 9 0

1 4 1 2 2 5 .3 8 3 0 .6 6 7 0 .5 7 1 0 .4 0 0 6 1 3 2 2 .8 8 4 1 .5 0 6 0 .6 1 9 0 .7 9 9

1 5 1 4 3 1 .7 6 8 1 .7 2 2 0 .6 6 7 0 .9 9 7 1 0 1 5 1 7 .5 6 2 1 .7 5 8 0 .7 1 4 1 .1 4 4 1 8 1 6 3 4 .6 9 5 2 .0 8 5 0 .7 6 2 1 .3 9 8 5 1 7 1 0 .9 1 1 2 .5 2 9 0 .8 1 0 1 .7 0 8 2 1 8 1 1 .7 0 9 2 .8 1 1 0 .8 5 7 1 .9 9 2

1 9 1 9 3 0 .4 3 8 4 .5 1 2 0 .9 0 5 2 .7 9 4 2 0 2 0 3 8 .6 8 6 4 .9 8 4 0 .9 5 2 3 .3 0 8

= D 7 3 /$ B $ 3 0 + N O R M S IN V (E 7 3 )

Page 66: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 66

Residuals vs. Fitted Values

-6

-4

-2

0

2

4

6

0 5 10 15 20 25 30 35 40 45

Fitted Values

Re

sid

ua

l Err

ors

Page 67: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 67

Residuals vs. Machine Type

-8

-6

-4

-2

0

2

4

6

Re

sid

ua

l Err

ors

Machine Type A Machine Type B

Page 68: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 68

Residuals vs. Speed

-6

-4

-2

0

2

4

6

400 500 600 700 800 900 1000 1100

Speed (RPM)

Re

sid

ua

l Err

ors

Page 69: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 69

Model BuildingIdeally, we build a model under clean, scientific

conditions:

• Understand the phenomenon well

• Have an a priori theoretical model

• Have valid, reliable measures of the variables

• Have data in adequate quantities over an appropriate range

• Regression validates and calibrates the model, not discovers it

Page 70: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 70

• Little understanding of the phenomenon• No a priori theory or model• Have data that may or may not cover all reasonable variables• Have measures of some variables, but little sense of their

validity or reliability• Have data in small quantities over a restricted range• We hope that regression uncovers some magical unexpected

relationships• This process has been referred to as Creative Regression

Analytical Prospecting, or CRAP. “This room is filled with horseshit; there must be a pony in here somewhere.”

Unfortunately, we too often find ourselves Data Mining:

Page 71: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 71

The Model Building Problem

Suppose we have data available for n variables. How do we pick the best sub-model from:

yielding, perhaps

There is no solution to this problem that is entirely satisfactory, but there are some reasonable heuristics.

ppXXXY 22110

8855330 XXXY

Page 72: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 72

• Scientific Ideology: In chemistry, physics, and biology, most good models are simple. The principle of parsimony carries over into social sciences, such as business analysis.

• Statistical Advantages: Even eliminating “significant” variables that don’t contribute much to the model can have advantages, especially for predicting the future. These advantages include less expensive data collection, smaller standard errors, and tighter confidence intervals.

Why Reduce the Number of Variables?

Page 73: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 73

Statistical Criteria for Comparing Models

pn

SSERMS p

p

SSTSSE

SSTSSR

R 12

2

22

ˆˆ

1

11

Y

nSST

pnSSE

R

Page 74: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 74

Taking into account the possible bias that comes from having an under-specified model, this measure estimates the MSE including both bias and variance:

If the model is complete (we have the p terms that matter) the expected value of Cp = p. So we look for models with Cp close to p.

npSSE

C pp 2

ˆ2

Mallows Cp

Page 75: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 75

All-Subsets

Forward

Backward

Stepwise

Best Subsets

Variable Selection Algorithms

Page 76: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 76

If there are p candidate independent variables, then there are 2p possible models. Why not look at them all?

This is not really a major computational problem, but can pose difficulties in looking at all of the output.

However, some reasonable schemes exist for looking at a relatively small subset of all the possible models.

All-Subsets Regression

Page 77: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 77

Start with one independent variable (the one with the strongest bivariate correlation with the dependent variable), and add additional variables until the next variable in line to enter fails to achieve a certain threshold value. This can be based on a minimum F value in the full-model/reduced-model test, called FIN, or it can be based on the last-in p-value for each candidate variable.

Forward selection is basically the same thing as Stepwise, except variables are never removed once they enter the model. Set “F to remove” to zero. The procedure ends when no variable not already in the model has an F-stat greater than FIN.

Forward Regression

Page 78: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 78

Start with all of the independent variables, and eliminate them one by one (on the basis of having the weakest t-stat) until the next variable fails to meet a minimum threshold. This can be an F criterion called FOUT, or a p-value criterion.

Backwards elimination starts with all of the independent variables, then removes them one at a time based on the stepwise procedure, except that no variable can re-enter once it has been removed. Set FIN at a very large number such as 100,000 and list all predictors in the Enter box. The procedure ends when no variable in the model has an F-stat less than FOUT.

Backward Regression

Page 79: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 79

An intelligent mixture of forward and backward ideas.

Variables can be entered or removed using FIN and FOUT criteria or p-value criteria.

Stepwise Regression

Page 80: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 80

The basic (default) method of stepwise regression calculates an F-statistic for each variable in the model.

Suppose the model contains X1, ... , Xp. Then the F-statistic for Xi is

with 1 and n - p - 1 degrees of freedom. If the F-statistic for any variable is less than F to remove, the variable with the smallest F is removed from the model. The regression equation is calculated for this smaller model, the results are printed, and the procedure proceeds to a new step.

),,(

),,(),,(

1

1,,111

p

ppii

XXMSE

XXSSEXXXXSSE

The F Criterion

Page 81: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 81

If no variable can be removed, the procedure attempts to add a variable. An F-statistic is calculated for each variable not yet in the model. Suppose the model, at this stage, contains X1, ... , Xp. Then the F-statistic for a new variable, Xp+1 is

The variable with the largest F-statistic is then added, provided its F-statistic is larger than F to enter. Adding this variable is equivalent to choosing the variable with the largest partial correlation or to choosing the variable that most effectively reduces SSE. The regression equation is then calculated, results are displayed, and the procedure goes to a new step. If no variable can enter, the stepwise procedure ends.

The p-value criterion is very similar, but uses a threshold alpha value.

),,,(

),,,(),,(

11

111

pp

ppp

XXXMSE

XXXSSEXXSSE

Page 82: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 82

A handy procedure that reports, for each number of independent variables p, the model with the highest R-square.

Best Subsets is an efficient way to select a group of "best subsets" for further analysis by selecting the smallest subset that fulfills certain statistical criteria. The subset model may actually estimate the regression coefficients and predict future responses with smaller variance than the full model using all predictors.

Best Subsets

Page 83: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 83

Excel’s regression utility is not well suited to iterative procedures like this.

More stats-focused packages like Minitab offer a more user-friendly method.

Minitab treats Forward and Backward as subsets of Stepwise. (This makes sense; they really are special cases where entered variables can’t leave, or removed variables can’t re-enter.

Minitab uses the p-value criterion by default.

Using Minitab

Page 84: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 84

Example: Rick Beck

Page 85: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 85

Page 86: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 86

Page 87: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 87

Need to select “regression” several times

Page 88: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 88

Page 89: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 89

Page 90: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 90

Page 91: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 91

Page 92: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 92

Page 93: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 93

Forward Selection of Terms α to enter = 0.25  Analysis of Variance Source DF Adj SS Adj MS F-Value P-ValueRegression 6 39.898 6.6496 73.62 0.000 Single 1 1.618 1.6176 17.91 0.000 Divorced 1 0.141 0.1408 1.56 0.212 Credit D 1 10.280 10.2796 113.81 0.000 Credit E 1 15.691 15.6911 173.72 0.000 Children 1 0.933 0.9332 10.33 0.001 Debt 1 2.436 2.4360 26.97 0.000Error 993 89.693 0.0903Total 999 129.591  Model Summary  S R-sq R-sq(adj) R-sq(pred)0.300542 30.79% 30.37% 29.39%  

Page 94: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 94

Coefficients Term Coef SE Coef T-Value P-Value VIFConstant 0.1762 0.0283 6.23 0.000Single 0.1078 0.0255 4.23 0.000 1.47Divorced 0.0429 0.0343 1.25 0.212 1.06Credit D 0.3377 0.0317 10.67 0.000 1.10Credit E 0.5480 0.0416 13.18 0.000 1.08Children -0.0750 0.0233 -3.21 0.001 1.46Debt -0.000001 0.000000 -5.19 0.000 1.16  Regression Equation Default = 0.1762 + 0.1078 Single + 0.0429 Divorced + 0.3377 Credit D + 0.5480 Credit E - 0.0750 Children - 0.000001 Debt 

Page 95: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 95

Regression – Regression – Best Subsets

Page 96: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 96

Page 97: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 97

Page 98: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 98

Page 99: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 99

Best Subsets Regression: Default versus Married, Divorced, ...

Response is Default

D C C C C C M i W r r r r h a v i e e e e i I r o d d d d d l n r r o i i i i d c i c w t t t t r A o R-Sq R-Sq Mallows e e e e g mVars R-Sq (adj) (pred) Cp S d d d A B C D n e e 1 9.4 9.3 8.9 299.3 0.34294 X 1 6.3 6.2 6.0 344.1 0.34882 X 2 13.0 12.8 12.4 250.0 0.33625 X X 2 12.0 11.8 11.2 265.1 0.33828 X X 3 25.1 24.8 24.4 79.9 0.31227 X X X 3 15.7 15.4 14.9 214.2 0.33127 X X X 4 28.0 27.7 27.1 40.2 0.30630 X X X X 4 27.5 27.3 26.7 46.2 0.30719 X X X X 5 29.3 29.0 28.1 22.8 0.30355 X X X X X 5 28.9 28.5 27.7 29.0 0.30449 X X X X X 6 30.0 29.6 28.7 15.1 0.30224 X X X X X X 6 29.7 29.2 28.4 20.0 0.30298 X X X X X X 7 30.5 30.0 29.0 10.4 0.30138 X X X X X X X 7 30.3 29.8 28.9 12.6 0.30172 X X X X X X X 8 30.7 30.1 29.1 9.2 0.30105 X X X X X X X X 8 30.6 30.1 29.0 10.1 0.30118 X X X X X X X X 9 30.8 30.2 29.1 9.5 0.30094 X X X X X X X X X 9 30.8 30.1 29.0 10.1 0.30104 X X X X X X X X X 10 30.8 30.1 29.0 11.0 0.30102 X X X X X X X X X X

Page 100: Session 7. Applied Regression -- Prof. Juran2 Outline Chi-square Goodness-of-Fit Tests Fit to a Normal Simulation Modeling Autocorrelation, serial correlation

Applied Regression -- Prof. Juran 100

Summary• Chi-square Goodness-of-Fit Tests

• Fit to a Normal• Simulation Modeling

• Autocorrelation, serial correlation• Runs test• Durbin-Watson

• Model Building• Variable Selection Methods• Minitab