test fit logit lecture
TRANSCRIPT
-
7/27/2019 Test fit logit Lecture
1/45
Lecture 5: ANOVA andRegression II: Model Selection
and Model Checkingor, How to choose a model, and then
find out its wrong
Bob OHara
-
7/27/2019 Test fit logit Lecture
2/45
Model Selection
We could fit all effects into a model
But this would be difficult to understand
which factors are important?
Instead, we want to remove the effects
which are not important, to leave the
interesting ones
How do we do this?
-
7/27/2019 Test fit logit Lecture
3/45
-
7/27/2019 Test fit logit Lecture
4/45
Whats a Good Model?
Should fit to the data
obvious?
Simple
easier to understand
Trade-off between model fit and complexity
Also: Interpretability
importance scientifically
depends on the purpose of the model
-
7/27/2019 Test fit logit Lecture
5/45
Criteria for Comparing Models
Ftests, from ANOVA table
test individual effects
can have problems with order of terms Information Criteria
AIC, BIC
Made up of two terms:xIC = Deviance + Complexity
Deviance = -2xLikelihood = Goodness of Fit
Complexity - penalises for number of parameters
-
7/27/2019 Test fit logit Lecture
6/45
Information Criteria
Try to minimise xIC
Better model fit, lower deviance
More parameters, higher the penalisation For n observations, p parameters
AIC = Deviance + 2p
tends to overestimate number of parametersBIC = Deviance + (ln n)p
leads to smaller models - perhaps too small?
can overpenalise factors with many levels
-
7/27/2019 Test fit logit Lecture
7/45
-
7/27/2019 Test fit logit Lecture
8/45
-
7/27/2019 Test fit logit Lecture
9/45
Selection
Forward selection
Start with no factors
add the best unselected factor until the present
model is the best
use AIC, BIC, F-ratios to decide the best
Backward selection
Start with all factors in the nodel
eliminate the worst covariates one by one until
all remaining covariates are good
again use AIC etc.
-
7/27/2019 Test fit logit Lecture
10/45
Stepwise Selection
Start with full model
Use backward selection
try and remove a term
Use forward selection
try and add a term
Iterate, trying to remove and add terms
Stop when the model doesnt change
-
7/27/2019 Test fit logit Lecture
11/45
-
7/27/2019 Test fit logit Lecture
12/45
-
7/27/2019 Test fit logit Lecture
13/45
Then...
Do the more automatic stuff
Stepwise Selection
F-stats
If you use the ANOVA table:
be careful about the order of the effects
try different orders
Always keep main effects if you have an
interaction
unless you have a good reason not to
-
7/27/2019 Test fit logit Lecture
14/45
-
7/27/2019 Test fit logit Lecture
15/45
-
7/27/2019 Test fit logit Lecture
16/45
Automatic Model Selection
Use AIC as a criterion
Try 2 starting points
just a constant
full model (all terms and interactions)
Can do automatically in R
-
7/27/2019 Test fit logit Lecture
17/45
Starting from Nothing
Initial AIC (just a constant): -43.66
Step 1:+ Eth + Age 0 + Sex + Lrn
-57.1 -44.3 -43.7 -42.5 -41.7
Add Eth to the model
Step 2:+ Age 0 + Sex + Lrn -Eth
-57.4 -57.1 -56.00 -55.1 -43.7
-
7/27/2019 Test fit logit Lecture
18/45
Carry on... Add Age to the model
Step 3:+ Eth.Age 0 - Age + Lrn + Sex -Eth
-61.1 -57.4 -57.1 -56.2 -55.9 -44.3
Add Eth.Age interaction
Step4:
0 +Lrn + Sex -Eth.Age
-61.1 -60.1 -59.6 -57.4
Stop Here!
-
7/27/2019 Test fit logit Lecture
19/45
Try from different starting points
Start from a constant in the model
end with Eth + Age + Eth.Age
Start from all main effects in the modelAll Main effects + Eth.Age + Sex.Age + Age:Lrn
Start from full model
All Main effects + All First Order interactions +Eth.Sex.Lrn + Eth.Age.Lrn
Last one has lowest AIC
-
7/27/2019 Test fit logit Lecture
20/45
-
7/27/2019 Test fit logit Lecture
21/45
-
7/27/2019 Test fit logit Lecture
22/45
-
7/27/2019 Test fit logit Lecture
23/45
-
7/27/2019 Test fit logit Lecture
24/45
A Good Fit
0 10 20 30 40 50
20
40
60
80
100
x
y
20 40 60 80 100
-10
-5
0
5
10
Predicted values
Residuals
-
7/27/2019 Test fit logit Lecture
25/45
An Outlier
0 10 20 30 40 50
20
40
60
80
100
1
20
140
x
y
20 40 60 80 100
0
20
40
60
Predicted values
Residuals
-
7/27/2019 Test fit logit Lecture
26/45
Curved Relationship
y=a+bx2+e
0 10 20 30 40 50
0
1000
2000
3000
4
000
5000
x
y
0 1000 2000 3000 4000
-500
0
50
0
Predicted values
Residuals
-
7/27/2019 Test fit logit Lecture
27/45
-
7/27/2019 Test fit logit Lecture
28/45
-
7/27/2019 Test fit logit Lecture
29/45
-
7/27/2019 Test fit logit Lecture
30/45
-
7/27/2019 Test fit logit Lecture
31/45
The Example (again)
Weve already found a good model, butdoes it fit?
Look at some figures...
-
7/27/2019 Test fit logit Lecture
32/45
-
7/27/2019 Test fit logit Lecture
33/45
-
7/27/2019 Test fit logit Lecture
34/45
Cooks D
0 50 100 150
0.0
0
0.0
5
0.1
0
0.1
5
Obs. number
Cook'sdistance
Cook's distance plot
32
14
98
Female, Aborigine,
Slow learner, Primary Age.
Only One.(6 days off, mean 16.4)
-
7/27/2019 Test fit logit Lecture
35/45
-
7/27/2019 Test fit logit Lecture
36/45
-
7/27/2019 Test fit logit Lecture
37/45
-
7/27/2019 Test fit logit Lecture
38/45
-
7/27/2019 Test fit logit Lecture
39/45
-
7/27/2019 Test fit logit Lecture
40/45
-
7/27/2019 Test fit logit Lecture
41/45
-
7/27/2019 Test fit logit Lecture
42/45
-
7/27/2019 Test fit logit Lecture
43/45
-
7/27/2019 Test fit logit Lecture
44/45
-
7/27/2019 Test fit logit Lecture
45/45