lecture 12 inference in mlr - purdue universityghobbs/stat_512/lecture_notes/... · 2011-02-16 ·...
TRANSCRIPT
12-1
Lecture 12
Inference in MLR
STAT 512
Spring 2011
Background Reading
KNNL: 6.6-6.7
12-2
Topic Overview
• Review MLR Model
• Inference about Regression Parameters
• Estimation of Mean Response
• Prediction
12-3
Multiple Regression Model
( )β ε
ε
11 1
2
where ~ N ,
nn n p p
n nσ
×× × ×
×
= +Y X
0 I
• X is the design matrix (each column after
the first corresponds to a predictor
variable).
• , ,β εY are vectors holding the responses,
parameters, and errors
12-4
Parameter Estimates
( )1−
′ ′=b X X X Y
• Variance-Covariance Matrix is
{ } ( )1
p p
MSE−
×
′= ×2s b X X
• To get { }2ks b , read the appropriate (kth)
diagonal element of the matrix (Note:
k=0,1,2,…,p-1).
• Use Estimate, SE to produce confidence
interval(s)
12-5
Confidence Intervals for kβ
• CI for kβ is { }k crit kb t s b±
• Critical value comes from t-distribution with
n p− degrees of freedom (DF for error)
• If CI includes zero, then we cannot reject
0 : 0kH β = (i.e. that variable is not
significant when added to the model
containing all of the other variables.)
12-6
Interpretations
• Similar to in SLR – a one-unit increase in Xk
while all other X’s held constant results in
a bk unit increase in the mean response.
• Example: 1 2ˆ 20 3 1.8Y X X= + −
� As X1 increases one unit, X2 held
constant, the mean response is increased
by 3 units.
� As X2 increases one unit, X1 held
constant, the mean response is decreased
by 1.8 units.
12-7
Interpretations (2)
• Can be more complicated if interactions,
squares, etc.
• Example: 1 2 1 2ˆ 20 3 1.8 0.3Y X X X X= + − +
� As X1 increases one unit, X2 held
constant, the mean response is increased
by 3 + 0.3X2 units.
� Interpretation now depends on the value
of X2 (that is the essence of interaction)
12-8
Joint Inferences
• Bonferroni Adjustment can be used to
estimate several regression coefficients
simultaneously.
• Instead of α, use /kα , where k is the
number of CI’s (or tests).
{ }k kb Bs b±
where ( )( )1 / 2 ,B t k n p= −α − .
12-9
Multicollinearity
• Multicollinearity can drastically affect
parameter estimates.
• Example: Your grade is correlated to study
time (X1). Grade is also correlated to the
number of class hours you are taking (X2).
• Might get individual regressions:
1 1
2 2
ˆ 75 3
ˆ 95M
M
Y X
Y X
= +
= −
12-10
Multicollinearity (2)
• Model 1 suggests each additional study hour
increases mean score by 3%.
• Model 2 suggests each additional class hour
decreases mean score by 1%
• For model using both variables, might get
3 1 2ˆ 85 1.5 0.5MY X X= + −
• Suggests that study hours increase mean
response by 1.5% while additional class
hours decrease it by 0.5%.
12-11
Multicollinearity (3)
• The real situation (probably): additional
study hours increase mean grade by 3%.
At the same time, additional class hours
means you won’t have as much time to
study for this class (X1 and X2 are probably
highly correlated).
• Can talk about one variable, but putting both
in the model and then trying to interpret
the parameters is difficult.
12-12
Another Example
• Electricity cost for a home during the
summer is related to the number of people
who live there (X1). It is also related to the
outdoor temperature (X2).
• In this situation, X1 and X2 would not be
correlated. The corresponding regression
models might be:
1 1ˆ 35 0.5MY X= +
2 2ˆ 45 8MY X= +
3 1 2ˆ 25 0.5 8MY X X= + +
12-13
Mean Response Estimation
Given any vector
( )1 2 ,( 1)1h h h h pX X X −′ =X ⋯
the mean response at h′X is given by
{ }hhE Y ′= X β
The estimated mean response is
ˆhhY ′= X b
12-14
Mean Response Estimation (2)
Variance may be written
{ } { }2 2ˆh h hs Y ′= X s b X
This makes sense because ˆhhY ′= X b.
More manipulation yields the estimated variance
as
{ } ( )( )12 ˆh h hs Y MSE
−′ ′= X X X X
which involves a “hat-like” product of matrices.
12-15
CI for Mean Response
• As before, can use t statistic with n – p
degrees of freedom to obtain confidence
limits
• If CI’s for multiple h
′X , should generally use
some kind of family confidence coefficient
adjustment
• Bonferroni can be used as we have learned,
but it is more conservative than necessary.
12-16
CI for Mean Response (2)
• Working-Hotelling Confidence region can be
produced analogous to SLR using the critical
value
( ), 1p n pW pF α−= −
• This produces a confidence region for the
regression surface. Of course if p = 2 then it
reduces to what we had for SLR.
• Is a good idea to use W-H if you want CI’s for
the mean response for several different h
′X ,
since it is less conservative than Bonferroni.
12-17
CI For Mean Response (3)
Table for n = 50 and p = 2.
# Intervals Bonferroni W-H
2 2.32 2.53
5 2.69 2.53
10 2.95 2.53
25 3.28 2.53
50 3.53 2.53
12-18
Prediction Intervals
• ˆhhY ′= X b is our point estimate
• Standard error for prediction is
{ } ( )( )12,ˆ 1h new h hs Y MSE
−′ ′= +X X X X
• Build CI as before using the t-critical value
• Again should make adjustment if producing
multiple intervals
12-19
Predictions for Multiple Xh
• Bonferroni – same as before, use a t-statistic
and adjust alpha by dividing by the number
of PI you want; a reasonably conservative
approach.
• Scheffe – use as critical value
( ), 1g n pS gF α−= −
Scheffe is a much more conservative
approach (appropriate for unplanned
hypothesis testing) and leads to wider CI’s.
12-20
Bonferroni vs Scheffe
Table for n = 20 and p = 5.
# Intervals Bonferroni Scheffe
2 2.32 2.53
5 2.69 3.48
10 2.95 4.52
25 3.28 6.62
50 3.53 9.02
12-21
Prediction of Mean of m New Obs.
• Variance associated to one observation is 2σ
• Variance associated to the mean of m new
observations is 2 /mσ
• Variance for prediction of such a mean is
sum of this variance plus variance in mean
response:
{ } ( )1
2 1h h
s predmean MSEm
− ′ ′= +
X X X X
• Can use { }s predmean to get CIs, adjusting
for multiple comparisons if needed.
12-22
Hidden Extrapolations
• Scope of model is region of values covered
by all of the predictor variables jointly.
• Joint range is not simply the cross-product
of individual ranges.
• Example: X1 ranges from 0-10 and X2
ranges from 0-10. May be that values near
endpoints on both variables are not
represented in the data; e.g. close in 2-
dimensions to (0,0). Hence should not
make inference in such regions.
12-23
12-24
CS Example (cs2.sas)
• Model
� High school Math grades (HSM)
� High school English grades (HSE)
PARAMETER ESTIMATES
Var DF Estimate Error t Value Pr > |t|
Int 1 0.62423 0.29172 2.14 0.0335
hsm 1 0.18265 0.03196 5.72 <.0001
hse 1 0.06067 0.03473 1.75 0.0820
12-25
CS Example (2)
Joint 95% CIs for the slope parameters
proc reg data =cs;
model gpa=hsm hse / clb alpha =0.025;
Variable DF 97.5% Confidence Limits
Intercept 1 -0.03412 1.28258
hsm 1 0.11054 0.25477
hse 1 -0.01771 0.13905
12-26
CS Example (3)
• Suppose we want confidence limits for
several mean responses (perhaps all
combinations of MATH = 5, 8, 10 and
ENGLISH = 5, 8, 10)
• Enough intervals we should use W-H since
the CIs will be tighter
• Should check to see if we are extrapolating
12-27
SAS Coding
proc sort data =cs; by hsm hse; proc freq data =cs; tables hsm*hse / out =freqcount outcum; proc gplot data =freqcount; bubble HSM*HSE=COUNT /blabel ; title 'Joint Region for HSM & HSE' ;
run; quit;
12-28
12-29
CS Example (5)
• Using scores of 5 or less are borderline
extrapolations, but probably it is ok.
• To get the CIs, produce W and determine
“effective alpha” to use T-distribution.
• In this case, effective alpha is 0.005 (see
SAS code)
• Can use this in conjunction with ‘clm’ to get
intervals proc reg data =cs;
model gpa = hsm hse / clm alpha =0.005;
output out =mr p=pred lclm =lower uclm =upper;
12-30
CS Example (6)
CI’s fCI’s fCI’s fCI’s for MEAN RESPONSE (using Wor MEAN RESPONSE (using Wor MEAN RESPONSE (using Wor MEAN RESPONSE (using W----H adj)H adj)H adj)H adj)
Obs hsm Obs hsm Obs hsm Obs hsm hse pred hse pred hse pred hse pred lower lower lower lower upperupperupperupper
225 5 5 1.84085 1.49588 2.18583
226 8 5 2.38881 2.06741 2.71022
227 10 5 2.75412 2.33588 3.17237
228 5 8 2.02286 1.69768 2.34804
229 8 8 2.57083 2.43570 2.70595
230 10 8 2.93613 2.73110 3.14117
231 5 10 2.14420 1.70389 2.58451
232 8 10 2.69217 2.45022 2.93411
233 10 10 3.05747 2.83293 3.28202
12-31
CS Example (7)
• Prediction intervals for the same set of
points can be obtained; use Bonferroni proc reg data =cs;
model gpa = hsm hse / cli alpha =0.005; output out =predict p=pred lcl =lower ucl =upper;
• These turn out to be virtually worthless for
this data set Obs hsm hse lower upper
225 5 5 -0.17257 3.85427
228 5 8 0.01274 4.03299
232 8 10 0.69382 4.69051
233 10 10 1.06116 5.05379
12-32
Conclusions for CS data
• Predicting GPA based on this data does not
lead to very good prediction intervals.
• We may certainly set some cutoffs, but
would likely be eliminating students who
would do reasonably at the same time we
eliminated drop-outs.
• Based on the prediction bounds, we might
accept students who have an upper
prediction bound that does get into the 3
point range.
12-33
Upcoming in Lecture 13
• Extra Sums of Squares (7.1)
• Coefficients of Partial Determination (7.4)
• General Linear Tests for testing whether
“groups” of variables are important in a
model (7.2, 7.3)