lecture 12 inference in mlr - purdue universityghobbs/stat_512/lecture_notes/... · 2011-02-16 ·...

12-1

Lecture 12

Inference in MLR

STAT 512

Spring 2011

Background Reading

KNNL: 6.6-6.7

12-2

Topic Overview

• Review MLR Model

• Inference about Regression Parameters

• Estimation of Mean Response

• Prediction

12-3

Multiple Regression Model

( )β ε

ε

11 1

2

where ~ N ,

nn n p p

n nσ

×× × ×

×

= +Y X

0 I

• X is the design matrix (each column after

the first corresponds to a predictor

variable).

• , ,β εY are vectors holding the responses,

parameters, and errors

12-4

Parameter Estimates

( )1−

′ ′=b X X X Y

• Variance-Covariance Matrix is

{ } ( )1

p p

MSE−

×

′= ×2s b X X

• To get { }2ks b , read the appropriate (kth)

diagonal element of the matrix (Note:

k=0,1,2,…,p-1).

• Use Estimate, SE to produce confidence

interval(s)

12-5

Confidence Intervals for kβ

• CI for kβ is { }k crit kb t s b±

• Critical value comes from t-distribution with

n p− degrees of freedom (DF for error)

• If CI includes zero, then we cannot reject

0 : 0kH β = (i.e. that variable is not

significant when added to the model

containing all of the other variables.)

12-6

Interpretations

• Similar to in SLR – a one-unit increase in Xk

while all other X’s held constant results in

a bk unit increase in the mean response.

• Example: 1 2ˆ 20 3 1.8Y X X= + −

� As X1 increases one unit, X2 held

constant, the mean response is increased

by 3 units.


constant, the mean response is decreased

by 1.8 units.

12-7

Interpretations (2)

• Can be more complicated if interactions,

squares, etc.

• Example: 1 2 1 2ˆ 20 3 1.8 0.3Y X X X X= + − +


constant, the mean response is increased

by 3 + 0.3X2 units.

� Interpretation now depends on the value

of X2 (that is the essence of interaction)

12-8

Joint Inferences

• Bonferroni Adjustment can be used to

estimate several regression coefficients

simultaneously.

• Instead of α, use /kα , where k is the

number of CI’s (or tests).

{ }k kb Bs b±

where ( )( )1 / 2 ,B t k n p= −α − .

12-9

Multicollinearity

• Multicollinearity can drastically affect

parameter estimates.

• Example: Your grade is correlated to study

time (X1). Grade is also correlated to the

number of class hours you are taking (X2).

• Might get individual regressions:

1 1

2 2

ˆ 75 3

ˆ 95M

M

Y X

Y X

= +

= −

12-10

Multicollinearity (2)

• Model 1 suggests each additional study hour

increases mean score by 3%.

• Model 2 suggests each additional class hour

decreases mean score by 1%

• For model using both variables, might get

3 1 2ˆ 85 1.5 0.5MY X X= + −

• Suggests that study hours increase mean

response by 1.5% while additional class

hours decrease it by 0.5%.

12-11

Multicollinearity (3)

• The real situation (probably): additional

study hours increase mean grade by 3%.

At the same time, additional class hours

means you won’t have as much time to

study for this class (X1 and X2 are probably

highly correlated).

• Can talk about one variable, but putting both

in the model and then trying to interpret

the parameters is difficult.

12-12

Another Example

• Electricity cost for a home during the

summer is related to the number of people

who live there (X1). It is also related to the

outdoor temperature (X2).

• In this situation, X1 and X2 would not be

correlated. The corresponding regression

models might be:

1 1ˆ 35 0.5MY X= +

2 2ˆ 45 8MY X= +

3 1 2ˆ 25 0.5 8MY X X= + +

12-13

Mean Response Estimation

Given any vector

( )1 2 ,( 1)1h h h h pX X X −′ =X ⋯

the mean response at h′X is given by

{ }hhE Y ′= X β

The estimated mean response is

ˆhhY ′= X b

12-14

Mean Response Estimation (2)

Variance may be written

{ } { }2 2ˆh h hs Y ′= X s b X

This makes sense because ˆhhY ′= X b.

More manipulation yields the estimated variance

as

{ } ( )( )12 ˆh h hs Y MSE

−′ ′= X X X X

which involves a “hat-like” product of matrices.

12-15

CI for Mean Response

• As before, can use t statistic with n – p

degrees of freedom to obtain confidence

limits

• If CI’s for multiple h

′X , should generally use

some kind of family confidence coefficient

adjustment

• Bonferroni can be used as we have learned,

but it is more conservative than necessary.

12-16

CI for Mean Response (2)

• Working-Hotelling Confidence region can be

produced analogous to SLR using the critical

value

( ), 1p n pW pF α−= −

• This produces a confidence region for the

regression surface. Of course if p = 2 then it

reduces to what we had for SLR.

• Is a good idea to use W-H if you want CI’s for

the mean response for several different h

′X ,

since it is less conservative than Bonferroni.

12-17

CI For Mean Response (3)

Table for n = 50 and p = 2.

# Intervals Bonferroni W-H

2 2.32 2.53

5 2.69 2.53

10 2.95 2.53

25 3.28 2.53

50 3.53 2.53

12-18

Prediction Intervals

• ˆhhY ′= X b is our point estimate

• Standard error for prediction is

{ } ( )( )12,ˆ 1h new h hs Y MSE

−′ ′= +X X X X

• Build CI as before using the t-critical value

• Again should make adjustment if producing

multiple intervals

12-19

Predictions for Multiple Xh

• Bonferroni – same as before, use a t-statistic

and adjust alpha by dividing by the number

of PI you want; a reasonably conservative

approach.

• Scheffe – use as critical value

( ), 1g n pS gF α−= −

Scheffe is a much more conservative

approach (appropriate for unplanned

hypothesis testing) and leads to wider CI’s.

12-20

Bonferroni vs Scheffe

Table for n = 20 and p = 5.

# Intervals Bonferroni Scheffe

2 2.32 2.53

5 2.69 3.48

10 2.95 4.52

25 3.28 6.62

50 3.53 9.02

12-21

Prediction of Mean of m New Obs.

• Variance associated to one observation is 2σ

• Variance associated to the mean of m new

observations is 2 /mσ

• Variance for prediction of such a mean is

sum of this variance plus variance in mean

response:

{ } ( )1

2 1h h

s predmean MSEm

− ′ ′= +

X X X X

• Can use { }s predmean to get CIs, adjusting

for multiple comparisons if needed.

12-22

Hidden Extrapolations

• Scope of model is region of values covered

by all of the predictor variables jointly.

• Joint range is not simply the cross-product

of individual ranges.

• Example: X1 ranges from 0-10 and X2

ranges from 0-10. May be that values near

endpoints on both variables are not

represented in the data; e.g. close in 2-

dimensions to (0,0). Hence should not

make inference in such regions.

12-24

CS Example (cs2.sas)

• Model

� High school Math grades (HSM)

� High school English grades (HSE)

PARAMETER ESTIMATES

Var DF Estimate Error t Value Pr > |t|

Int 1 0.62423 0.29172 2.14 0.0335

hsm 1 0.18265 0.03196 5.72 <.0001

hse 1 0.06067 0.03473 1.75 0.0820

12-25

CS Example (2)

Joint 95% CIs for the slope parameters

proc reg data =cs;

model gpa=hsm hse / clb alpha =0.025;

Variable DF 97.5% Confidence Limits

Intercept 1 -0.03412 1.28258

hsm 1 0.11054 0.25477

hse 1 -0.01771 0.13905

12-26

CS Example (3)

• Suppose we want confidence limits for

several mean responses (perhaps all

combinations of MATH = 5, 8, 10 and

ENGLISH = 5, 8, 10)

• Enough intervals we should use W-H since

the CIs will be tighter

• Should check to see if we are extrapolating

12-27

SAS Coding

proc sort data =cs; by hsm hse; proc freq data =cs; tables hsm*hse / out =freqcount outcum; proc gplot data =freqcount; bubble HSM*HSE=COUNT /blabel ; title 'Joint Region for HSM & HSE' ;

run; quit;

12-29

CS Example (5)

• Using scores of 5 or less are borderline

extrapolations, but probably it is ok.

• To get the CIs, produce W and determine

“effective alpha” to use T-distribution.

• In this case, effective alpha is 0.005 (see

SAS code)

• Can use this in conjunction with ‘clm’ to get

intervals proc reg data =cs;

model gpa = hsm hse / clm alpha =0.005;

output out =mr p=pred lclm =lower uclm =upper;

12-30

CS Example (6)

CI’s fCI’s fCI’s fCI’s for MEAN RESPONSE (using Wor MEAN RESPONSE (using Wor MEAN RESPONSE (using Wor MEAN RESPONSE (using W----H adj)H adj)H adj)H adj)

Obs hsm Obs hsm Obs hsm Obs hsm hse pred hse pred hse pred hse pred lower lower lower lower upperupperupperupper

225 5 5 1.84085 1.49588 2.18583

226 8 5 2.38881 2.06741 2.71022

227 10 5 2.75412 2.33588 3.17237

228 5 8 2.02286 1.69768 2.34804

229 8 8 2.57083 2.43570 2.70595

230 10 8 2.93613 2.73110 3.14117

231 5 10 2.14420 1.70389 2.58451

232 8 10 2.69217 2.45022 2.93411

233 10 10 3.05747 2.83293 3.28202

12-31

CS Example (7)

• Prediction intervals for the same set of

points can be obtained; use Bonferroni proc reg data =cs;

model gpa = hsm hse / cli alpha =0.005; output out =predict p=pred lcl =lower ucl =upper;

• These turn out to be virtually worthless for

this data set Obs hsm hse lower upper

225 5 5 -0.17257 3.85427

228 5 8 0.01274 4.03299

232 8 10 0.69382 4.69051

233 10 10 1.06116 5.05379

12-32

Conclusions for CS data

• Predicting GPA based on this data does not

lead to very good prediction intervals.

• We may certainly set some cutoffs, but

would likely be eliminating students who

would do reasonably at the same time we

eliminated drop-outs.

• Based on the prediction bounds, we might

accept students who have an upper

prediction bound that does get into the 3

point range.

12-33

Upcoming in Lecture 13

• Extra Sums of Squares (7.1)

• Coefficients of Partial Determination (7.4)

• General Linear Tests for testing whether

“groups” of variables are important in a

model (7.2, 7.3)

lecture 12 inference in mlr - purdue universityghobbs/stat_512/lecture_notes/... · 2011-02-16 ·...

Documents