1 3.3 hypothesis testing in multiple linear regression questions: –what is the overall adequacy of...

53
1 3.3 Hypothesis Testing in Multiple Linear Regression • Questions: – What is the overall adequacy of the model? – Which specific regressors seem important? Assume the errors are independent and follow a normal distribution with mean 0 and variance 2

Upload: harry-wiler

Post on 14-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

1

3.3 Hypothesis Testing in Multiple Linear Regression

• Questions:– What is the overall adequacy of the model?– Which specific regressors seem important?

• Assume the errors are independent and follow a normal distribution with mean 0 and variance 2

Page 2: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

2

3.3.1 Test for Significance of Regression• Determine if there is a linear relationship between

y and xj, j = 1,2,…,k.

• The hypotheses are

H0: β1 = β2 =…= βk = 0

H1: βj 0 for at least one j

• ANOVA

• SST = SSR + SSRes

• SSR/2 ~ 2k, SSRes/2 ~ 2

n-k-1, and SSR and SSRes

are independent1,

ReRe0 ~

)1/(

/

knk

s

R

s

R FMS

MS

knSS

kSSF

Page 3: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

3

• Under H1, F0 follows F distribution with k and n-

k-1 and a noncentrality parameter of

knkn

kk

c

k

ccR

s

xxxx

xxxx

X

k

XXMSE

MSE

11

1111

1*

2

*'*'2

2Re

)',...,(

)(

)(

2

*'*'

ccXX

Page 4: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

4

• ANOVA table

Page 5: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

5

Page 6: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

6

• Example 3.3 The Delivery Time Data

Page 7: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

7

• R2 and Adjusted R2 – R2 always increase when a regressor is added to

the model, regardless of the value of the contribution of that variable.

– An adjusted R2:

– The adjusted R2 will only increase on adding a variable to the model if the addition of the variable reduces the residual mean squares.

)1/(

)/(1 Re2

nSS

pnSSR

T

sadj

Page 8: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

8

3.3.2 Tests on Individual Regression Coefficients• For the individual regression coefficient:

– H0: βj = 0 v.s. H1: βj 0

– Let Cjj be the j-th diagonal element of (X’X)-1.

The test statistic:

– This is a partial or marginal test because any estimate of the regression coefficient depends on all of the other regression variables.

– This test is a test of contribution of xj given the

other regressors in the model

120 ~)ˆ(

ˆ

ˆ

ˆ kn

j

j

jj

j tseC

t

Page 9: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

9

• Example 3.4 The Delivery Time Data

Page 10: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

10

• The subset of regressors:

Page 11: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

11

• For the full model, the regression sum of square

• Under the null hypothesis, the regression sum of squares for the reduce model

• The degree of freedom is p-r for the reduce model.

• The regression sum of square due to β2 given β1

• This is called the extra sum of squares due to β2

and the degree of freedom is p - (p - r) = r• The test statistic

yXSSR ''ˆ)(

yXSSR'1

'11

ˆ)(

)()()|( 112 RRR SSSSSS

pnrs

R FMS

rSSF ,

Re

120 ~

/)|(

Page 12: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

12

• If β2 0, F0 follows a noncentral F distribution

with

• Multicollinearity: this test actually has no power!

• This test has maximal power when X1 and X2 are

orthogonal to one another!

• Partial F test: Given the regressors in X1, measure

the contribution of the regressors in X2.

22'1

11

'11

'2

'22

])([1

XXXXXIX

Page 13: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

13

• Consider y = β0 + β1 x1 + β2 x2 + β3 x3 +

SSR(β1| β0 , β2, β3), SSR(β2| β0 , β1, β3) and SSR(β3|

β0 , β2, β1) are signal-degree-of –freedom sums of

squares.

• SSR(βj| β0 ,…, βj-1, βj, … βk) : the contribution of

xj as if it were the last variable added to the model.

• This F test is equivalent to the t test.

• SST = SSR(β1 ,β2, β3|β0) + SSRes

• SSR(β1 ,β2 , β3|β0) = SSR(β1|β0) + SSR(β2|β1, β0) +

SSR(β3 |β1, β2, β0)

Page 14: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

14

• Example 3.5 Delivery Time Data

Page 15: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

15

3.3.3 Special Case of Orthogonal Columns in X

• Model: y = Xβ + = X1β1+ X2β2 +

• Orthogonal: X1’X2 = 0

• Since the normal equation (X’X)β= X’y,

yX

yX

XX

XX'2

'1

2

1

2'2

1'1

ˆ

ˆ

0

0

yXXXyXXX '2

12

'22

'1

11

'11 )(ˆ and )(ˆ

Page 16: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

16

Page 17: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

17

3.3.4 Testing the General Linear Hypothesis• Let T be an m p matrix, and rank(T) = r• Full model: y = Xβ +

• Reduced model: y = Z + , Z is an n (p-r) matrix and is a (p-r) 1 vector. Then

• The difference: SSH = SSRes(RM) – SSRes(FM) with r degree of freedom. SSH is called the sum of squares due to the hypothesis H0: Tβ = 0

freedom) of degree p-(n ''ˆ')(Re yXyyFMSS s

freedom) of degreer p-(n ''ˆ')(

')'(ˆ

Re

1

yZyyRMSS

yZZZ

s

Page 18: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

18

• The test statistic:

pnrs

H FpnFMSS

rSSF

,Re

~)/()(

/

Page 19: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

19

Page 20: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

20

• Another form:

• H0: Tβ = c v.s. H1: Tβ c Then

)/()(

/ˆ]')'([''ˆ

Re

11

pnFMSS

rTTXXTTF

s

pnrs

FpnFMSS

rcTTXXTcTF

,Re

11

~)/()(

/)ˆ(]')'([)'ˆ(

Page 21: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

21

3.4 Confidence Intervals in Multiple Regression

3.4.1 Confidence Intervals on the Regression Coefficients

• Under the normality assumption,

))'(,(~ˆ 12 XXMN

Page 22: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

22

Page 23: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

23

3.4.2 Confidence Interval Estimation of the Mean Response

• A confidence interval on the mean response at a particular point.

• x0 = (1,x01,…,x0k)’

• The unbiased estimator of E(y|x0) :

2/10

1'0

2,2/0

01'

02

0

'000

))'((y

responsemean on the C.I. )-100(1 The

)'()ˆ(

)|()ˆ(

xXXxt

xXXxyVar

xxyEyE

pn

Page 24: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

24

• Example 3.9 The Delivery Time Data

Page 25: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

25

3.4.3 Simultaneous Confidence Intervals on Regression Coefficients

• An elliptically shaped region

Page 26: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

26

• Example 3.10 The Rocket Propellant Data

Page 27: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

27

Page 28: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

28

• Another approach:

is chosen so that a specified probability that all intervals are correct is obtained.

• Bonferroni method: Δ= tα/2p, n-p

• Scheffe S-method: Δ=(2Fα,p, n-p )1/2

• Maximum modulus t procedure: Δ= uα,p, n-2 is the

upper tail point of the distribution of the maximum absolute value of two independent student t r.v.’s each based on n-2 degree of freedom

k ..., 1, 0,j ),ˆ(ˆ jj se

Page 29: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

29

• Example 3.11 The Rocket Propellant Data

• Find 90% joint C.I. for β0 and β1 by constructing a 95% C.I. for each parameter.

Page 30: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

30

• The confidence ellipse is always a more efficient procedure than the Bonferroni method because the volume of the ellipse is always less than the volume of the space covere3d by the Bonferroni intervals.

• Bonferroni intervals are easier to construct.• The length of C.I.:

Maximum modulus t < Bonferroni method

< Scheffe S-method

Page 31: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

31

3.5 Prediction of New Observations

Page 32: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

32

3.6 Hidden Extrapolation in Multiple Regression

• Be careful about extrapolating beyond the region containing the original observations!

• Rectangle formed by ranges of regressors NOT data region.

• Regressor variable hull (RVH): the convex hull of the original n data points.

– Interpolation: x0 RVH

– Extrapolation: x0 RVH

Page 33: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

33

Page 34: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

34

• hii of the hat matrix H = X(XX)-1X’are useful in

detecting hidden extrapolation.

• hmax: the maximum of hii . The point xi that has the

largest value of hii will lie on the boundary of

RVH

• {x | x(XX)-1x h≦ max } is an ellipsoid enclosing all

points inside the RVH.

• Let h00 = x′(X′X)-1x0

– h00 hmax : inside the RVH and the boundary

of RVH

– h00 > hmax : outside the RVH

Page 35: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

35

• MCE : minimum covering ellipsoid (Weisberg, 1985).

Page 36: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

36

Page 37: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

37

3.7 Standardized Regression Coefficients

• Difficult to compare regression coefficients directly.

• Unit Normal Scaling: Standardize a Normal r.v.

Page 38: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

38

• New model:

– There is no intercept.– The least-square estimator of b is

nizbzby iikkii ,...,1 ,11*

*1 ')'(ˆ yZZZb

Page 39: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

39

• Unit Length Scaling:

Page 40: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

40

• New Model:

• The least-square estimator:

niwbwby iikkii ,...,1 ,110

01 ')'(ˆ yWWWb

Page 41: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

41

• It does not matter which scaling we use! They both produce the same set of dimensionless regression coefficient.

Page 42: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

42

Page 43: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

43

Page 44: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

44

3.8 Multicollinearity

• A serious problem: Multicollinearity or near-linear dependence among the regression variables.

• The regressors are the columns of X. So an exact linear dependence would result a singular X’X

Page 45: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

45

• Unit length scaling

1)ˆ()ˆ(

10

01)'( and

10

01'

22

21

1

bVarbVar

WWWW

Page 46: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

46

• Soft drink data:

• Off-diagonal elements are of W’W usually called the simple correlations between regressors.

12.3)ˆ()ˆ(

12.357.2

57.212.3)'( and

1824.0

824.01'

22

21

1

bVarbVar

WWWW

Page 47: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

47

• Variance inflation factors (VIFs):– The main diagonal elements of the inverse of

X’X ((W’W)-1 above)

– From above two cases:Soft drink: VIF1 = VIF2 =

3.12 and Figure 3.12: VIF1 = VIF2 = 1

– VIFj = 1/(1-Rj)

– Rj is the coefficient of multiple determination

obtained from regressing xj on the other regressor

variables.

– If xj is nearly linearly dependent on some of the

other regressors, then Rj 1 and VIFj will be

large.– Serious problems: VIFs > 10

Page 48: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

48

• Figure 3.13 (a): The plan is unstable and very sensitive to relatively small changes in the data points.

• Figure 3.13 (b): Orthogonal regressors.

Page 49: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

49

3.9 Why Do Regression Coefficients Have the Wrong Sign?

• The reasons of the wrong sign:

1. The range of some of the regressors is too small.

2. Important regressors have not been included in the model.

3. Multicollinearity is present.

4. Computational errors have been made.

Page 50: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

50

• For reason 1:

n

iixx xxSVar

1

221 ))(/(/)ˆ(

Page 51: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

51

• Although it is possible to decrease the variance of the regression coefficients by increase the range of the x’s, it may not be desirable to spread the levels of the regressors out too far:– The true response function may be nonlinear.– Impractical or impossible.

• For reason 2:

Page 52: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

52

• t.coefficien regression total"" a is ˆ here

, 463.0835.1ˆ 1

xy

given x xofeffect theis ˆ Here

649.3222.1036.1ˆ

211

21

xxy

Page 53: 1 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important?

53

• Fore reason 3: Multicollinearity inflates the variances of the coefficients, and this increases the probability that one or more regression coefficients will have the wrong sign.

• Different computer programs handle round-off or truncation problems in different ways, and some programs are more effective than the others in this regard.