model selection and model building

44
Model selection and model building

Upload: pascale-frank

Post on 30-Dec-2015

46 views

Category:

Documents


0 download

DESCRIPTION

Model selection and model building. Model selection. Selection of predictor variables. Statement of problem. A common problem is that there is a large set of candidate predictor variables. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Model selection and  model building

Model selection and model building

Page 2: Model selection and  model building

Model selection

Selection of predictor variables

Page 3: Model selection and  model building

Statement of problem

• A common problem is that there is a large set of candidate predictor variables.

• Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability.

Page 4: Model selection and  model building

Example: Cement data

• Response y: heat evolved in calories during hardening of cement on a per gram basis

• Predictor x1: % of tricalcium aluminate

• Predictor x2: % of tricalcium silicate

• Predictor x3: % of tetracalcium alumino ferrite

• Predictor x4: % of dicalcium silicate

Page 5: Model selection and  model building

Example: Cement data

83.35

105.05

6

16

37.25

59.75

8.75

18.25

83.35

105.0

5

19.5

46.5

6 1637.25

59.75 8.7

518.25 19

.546.5

y

x1

x2

x3

x4

Page 6: Model selection and  model building

Two basic methods of selecting predictors

• Stepwise regression: Enter and remove variables, in a stepwise manner, until no justifiable reason to enter or remove more.

• Best subsets regression: Select the subset of variables that do the best at meeting some well-defined objective criterion.

Page 7: Model selection and  model building

Stepwise regression: the idea

• Start with no predictors in the model.

• At each step, enter or remove a variable based on partial F-tests.

• Stop when no more variables can be justifiably entered or removed.

Page 8: Model selection and  model building

Stepwise regression: the steps

• Specify an Alpha-to-Enter (0.15) and an Alpha-to-Remove (0.15).

• Start with no predictors in the model.• Put the predictor with the smallest P-value

based on the partial F statistic (a t-statistic) in the model. If P-value > 0.15, then stop. None of the predictors have good predictive ability. Otherwise …

Page 9: Model selection and  model building

Stepwise regression: the steps

• Add the predictor with the smallest P-value (below 0.15) based on the partial F-statistic (a t-statistic) in the model. If none of the predictors yield P-values < 0.15, stop.

• If P-value > 0.15 for any of the partial F statistics, then remove violating predictor.

• Continue above two steps, until no more predictors can be entered or removed.

Page 10: Model selection and  model building

Predictor Coef SE Coef T PConstant 81.479 4.927 16.54 0.000x1 1.8687 0.5264 3.55 0.005

Predictor Coef SE Coef T PConstant 57.424 8.491 6.76 0.000x2 0.7891 0.1684 4.69 0.001

Predictor Coef SE Coef T PConstant 110.203 7.948 13.87 0.000x3 -1.2558 0.5984 -2.10 0.060

Predictor Coef SE Coef T PConstant 117.568 5.262 22.34 0.000x4 -0.7382 0.1546 -4.77 0.001

Example: Cement data

Page 11: Model selection and  model building

Predictor Coef SE Coef T PConstant 103.097 2.124 48.54 0.000x4 -0.61395 0.04864 -12.62 0.000x1 1.4400 0.1384 10.40 0.000

Predictor Coef SE Coef T PConstant 94.16 56.63 1.66 0.127x4 -0.4569 0.6960 -0.66 0.526x2 0.3109 0.7486 0.42 0.687

Predictor Coef SE Coef T PConstant 131.282 3.275 40.09 0.000x4 -0.72460 0.07233 -10.02 0.000x3 -1.1999 0.1890 -6.35 0.000

Page 12: Model selection and  model building

Predictor Coef SE Coef T PConstant 71.65 14.14 5.07 0.001x4 -0.2365 0.1733 -1.37 0.205x1 1.4519 0.1170 12.41 0.000x2 0.4161 0.1856 2.24 0.052

Predictor Coef SE Coef T PConstant 111.684 4.562 24.48 0.000x4 -0.64280 0.04454 -14.43 0.000x1 1.0519 0.2237 4.70 0.001x3 -0.4100 0.1992 -2.06 0.070

Page 13: Model selection and  model building

Predictor Coef SE Coef T PConstant 52.577 2.286 23.00 0.000x1 1.4683 0.1213 12.10 0.000x2 0.66225 0.04585 14.44 0.000

Page 14: Model selection and  model building

Predictor Coef SE Coef T PConstant 71.65 14.14 5.07 0.001x1 1.4519 0.1170 12.41 0.000x2 0.4161 0.1856 2.24 0.052x4 -0.2365 0.1733 -1.37 0.205

Predictor Coef SE Coef T PConstant 48.194 3.913 12.32 0.000x1 1.6959 0.2046 8.29 0.000x2 0.65691 0.04423 14.85 0.000x3 0.2500 0.1847 1.35 0.209

Page 15: Model selection and  model building

Predictor Coef SE Coef T PConstant 52.577 2.286 23.00 0.000x1 1.4683 0.1213 12.10 0.000x2 0.66225 0.04585 14.44 0.000

Page 16: Model selection and  model building

Stepwise Regression: y versus x1, x2, x3, x4 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is y on 4 predictors, with N = 13

Step 1 2 3 4Constant 117.57 103.10 71.65 52.58

x4 -0.738 -0.614 -0.237 T-Value -4.77 -12.62 -1.37 P-Value 0.001 0.000 0.205

x1 1.44 1.45 1.47T-Value 10.40 12.41 12.10P-Value 0.000 0.000 0.000

x2 0.416 0.662T-Value 2.24 14.44P-Value 0.052 0.000

S 8.96 2.73 2.31 2.41R-Sq 67.45 97.25 98.23 97.87R-Sq(adj) 64.50 96.70 97.64 97.44C-p 138.7 5.5 3.0 2.7

Page 17: Model selection and  model building

Drawbacks of stepwise regression

• The final model is not guaranteed to be optimal in any specified sense.

• The procedure yields a single final model, although in practice there are often several almost equally good models.

Page 18: Model selection and  model building

Best subsets regression

• If there are p-1 possible predictors, then there are 2p-1 possible regression models containing the predictors.

• For example, 10 predictors yields 210 = 1024 possible regression models.

• A best subsets algorithm determines the best subsets of each size, so that choice of the final model can be made by researcher.

Page 19: Model selection and  model building

What is used to judge “best”?

• R-square

• Adjusted R-square

• MSE (or S = square root of MSE)

• Mallow’s Cp

Page 20: Model selection and  model building

R-squared

SSTO

SSE

SSTO

SSRR 12

Use the R-squared values to find the point where adding more predictors is not worthwhile because it leads to a very small increase in R-squared.

Page 21: Model selection and  model building

Adjusted R-squared or MSE

MSESSTO

n

SSTO

SSE

pn

nRa

1

11

12

Adjusted R-squared increases only if MSE decreases, so adjusted R-squared and MSE provide equivalent information.

Find a few subsets for which MSE is smallest (or adjusted R-squared is largest) or so close to the smallest (largest) that adding more predictors is not worthwhile.

Page 22: Model selection and  model building

Mallow’s Cp criterion

pnXXMSE

SSEC

p

pp 2

),...,( 11

Mallow’s Cp statistic:

is an estimator of total standardized mean square error of prediction:

2

12

ˆ1

n

iiipp YEYE

n

i

n

iipiipp YVarYEYE

1 1

2

2ˆˆ1

which equals:

Page 23: Model selection and  model building

Using the Cp criterion

• Subsets with small Cp values have a small total (standardized) mean square error of prediction.

• When the Cp value is also near p, the bias of the regression model is small.

Page 24: Model selection and  model building

Using the Cp criterion (cont’d)

• So, identify subsets of predictors for which:– the Cp value is smallest, and

– the Cp value is near p (if possible)

• Note though that for the full model, Cp= p. So, the fullest model is always judged to be a good candidate model.

Page 25: Model selection and  model building

Best Subsets Regression: y versus x1, x2, x3, x4

Response is y

x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4

1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X

Page 26: Model selection and  model building

Example: Modeling PIQ

130.5

91.5

100.728

86.283

73.25

65.75

130.591.

5

170.5

127.5

100.72

886.

283 73.25

65.75

170.5

127.5

PIQ

MRI

Height

Weight

Page 27: Model selection and  model building

Stepwise Regression: PIQ versus MRI, Height, Weight Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is PIQ on 3 predictors, with N = 38

Step 1 2Constant 4.652 111.276

MRI 1.18 2.06T-Value 2.45 3.77P-Value 0.019 0.001

Height -2.73T-Value -2.75P-Value 0.009

S 21.2 19.5R-Sq 14.27 29.49R-Sq(adj) 11.89 25.46C-p 7.3 2.0

Page 28: Model selection and  model building

Best Subsets Regression: PIQ versus MRI, Height, WeightResponse is PIQ

H W e e i i M g g R h h Vars R-Sq R-Sq(adj) C-p S I t t

1 14.3 11.9 7.3 21.212 X 1 0.9 0.0 13.8 22.810 X 2 29.5 25.5 2.0 19.510 X X 2 19.3 14.6 6.9 20.878 X X 3 29.5 23.3 4.0 19.794 X X X

Page 29: Model selection and  model building

The regression equation isPIQ = 111 + 2.06 MRI - 2.73 Height

Predictor Coef SE Coef T PConstant 111.28 55.87 1.99 0.054MRI 2.0606 0.5466 3.77 0.001Height -2.7299 0.9932 -2.75 0.009

S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5%

Analysis of Variance

Source DF SS MS F PRegression 2 5572.7 2786.4 7.32 0.002Error 35 13321.8 380.6Total 37 18894.6

Source DF Seq SSMRI 1 2697.1Height 1 2875.6

Page 30: Model selection and  model building

Example: Modeling BP

120

110

53.25

47.75

97.325

89.375

2.125

1.875

8.275

4.425

72.5

65.5

120110

76.25

30.75

53.25

47.75

97.325

89.375 2.1

251.8

758.2

754.4

25 72.5

65.5

76.25

30.75

BP

Age

Weight

BSA

Duration

Pulse

Stress

Page 31: Model selection and  model building

Stepwise Regression: BP versus Age, Weight, BSA, Duration, Pulse, Stress Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is BP on 6 predictors, with N = 20

Step 1 2 3Constant 2.205 -16.579 -13.667

Weight 1.201 1.033 0.906T-Value 12.92 33.15 18.49P-Value 0.000 0.000 0.000

Age 0.708 0.702T-Value 13.23 15.96P-Value 0.000 0.000

BSA 4.6T-Value 3.04P-Value 0.008

S 1.74 0.533 0.437R-Sq 90.26 99.14 99.45R-Sq(adj) 89.72 99.04 99.35C-p 312.8 15.1 6.4

Page 32: Model selection and  model building

Best Subsets Regression: BP versus Age, Weight, ...Response is BP D u W r S e a P t i t u r A g B i l e g h S o s s Vars R-Sq R-Sq(adj) C-p S e t A n e s

1 90.3 89.7 312.8 1.7405 X 1 75.0 73.6 829.1 2.7903 X 2 99.1 99.0 15.1 0.53269 X X 2 92.0 91.0 256.6 1.6246 X X 3 99.5 99.4 6.4 0.43705 X X X 3 99.2 99.1 14.1 0.52012 X X X 4 99.5 99.4 6.4 0.42591 X X X X 4 99.5 99.4 7.1 0.43500 X X X X 5 99.6 99.4 7.0 0.42142 X X X X X 5 99.5 99.4 7.7 0.43078 X X X X X 6 99.6 99.4 7.0 0.40723 X X X X X X

Page 33: Model selection and  model building

The regression equation isBP = - 13.7 + 0.702 Age + 0.906 Weight + 4.63 BSA

Predictor Coef SE Coef T PConstant -13.667 2.647 -5.16 0.000Age 0.70162 0.04396 15.96 0.000Weight 0.90582 0.04899 18.49 0.000BSA 4.627 1.521 3.04 0.008S = 0.4370 R-Sq = 99.5% R-Sq(adj) = 99.4%

Analysis of Variance

Source DF SS MS F PRegression 3 556.94 185.65 971.93 0.000Error 16 3.06 0.19Total 19 560.00

Source DF Seq SSAge 1 243.27Weight 1 311.91BSA 1 1.77

Page 34: Model selection and  model building

Stepwise regression in Minitab

• Stat >> Regression >> Stepwise …

• Specify response and all possible predictors.

• If desired, specify predictors that must be included in every model.

• Select OK. Results appear in session window.

Page 35: Model selection and  model building

Best subsets regression

• Stat >> Regression >> Best subsets …

• Specify response and all possible predictors.

• If desired, specify predictors that must be included in every model.

• Select OK. Results appear in session window.

Page 36: Model selection and  model building

Model building strategy

Page 37: Model selection and  model building

The first step

• Decide on the type of model needed– Predictive: model used to predict the response

variable from a chosen set of predictors.– Theoretical: model based on theoretical

relationship between response and predictors.– Control: model used to control a response

variable by manipulating predictor variables.

Page 38: Model selection and  model building

The first step (cont’d)

• Decide on the type of model needed– Inferential: model used to explore strength of

relationships between response and predictors.– Data summary: model used merely as a way to

summarize a large set of data by a single equation.

Page 39: Model selection and  model building

The second step

• Decide which predictor variables and response variable on which to collect the data.

• Collect the data.

Page 40: Model selection and  model building

The third step

• Explore the data– Check for outliers, gross data errors, missing

values on a univariate basis.– Study bivariate relationships to reveal other

outliers, to suggest possible transformations, to identify possible multicollinearities.

Page 41: Model selection and  model building

The fourth step

• Randomly divide the data into a training set and a test set:– The training set, with at least 15-20 error d.f.,

is used to fit the model.– The test set is used for cross-validation of the

fitted model.

Page 42: Model selection and  model building

The fifth step

• Using the training set, fit several candidate models:– Use best subsets regression.– Use stepwise regression (only gives one model

unless specifies different alpha-to-remove and alpha-to-enter values).

Page 43: Model selection and  model building

The sixth step

• Select and evaluate a few “good” models:– Select based on adjusted R2, Mallow’s Cp,

number and nature of predictors.– Evaluate selected models for violation of model

assumptions.– If none of the models provide a satisfactory fit,

try something else, such as more data, different predictors, a different class of model …

Page 44: Model selection and  model building

The final step

• Select the final model:– Compare competing models by cross-validating

them against the test data.– The model with a larger cross-validation R2 is a

better predictive model.– Consider residual plots, outliers, parsimony,

relevance, and ease of measurement of predictors.