multivariate linear regression

8/3/2019 Multivariate Linear Regression

1/45

A. The Basic Principle

We consider the multivariate extension of multiplelinear regression modeling the relationshipbetween m responses Y1,,Ym and a single set of rpredictor variables z1,,zr. Each of the m responses is

assumed to follow its own regression model, i.e.,

Y1 = B01 + B11z1 + B21z2 + 0 + Br1zrY2 = B02 + B12z1 + B22z2 + 0 + Br2zr

1 1 1

Y1 = B01 + B11z1 + B21z2 + 0 + Br1zr

where

V. Multivariate Linear Regression

-

1

2

m

E = E = 0, Var =


2/45

Conceptually, we can let

[zj0, zj1, , zjr]

denote the values of the predictor variables for the jth

trial and

be the responses and errors for the jth trial. Thus we

have an n x (r + 1) design matrix

,

- -

j1 j1

j2 j2

j

jm jm

Y

Y

Y = =

Y

-

10 11 1r

20 21 2r

n0 n1 nr

z z z

z z zZ =

z z z


3/45

- -

-

'

'

'

| | |

1

11 12 1m 2

21 22 2m

1 2 m

n1 n2 nm

m

= = =

If we now set

| | |

-

-

11 12 1m

21 22 2m

1 2 m

n1 n2 nm

Y Y Y

Y Y YY = = Y Y Y

Y Y Y

| | |

-

-

01 02 0m

11 12 1m

1 2 m

r1 r2 rm

= =


4/45

and

the multivariate linear regression model is

- i

E = 0

Note also that the m observed responses on the jth

trial have covariance matrix

i k ikCov , = I, i,k = 1, ,m

Y = Z +

with

-

11 12 1m

21 22 2m

m1 m2 mm

=


5/45

The ordinary least squares estimates F are found in amanner analogous to the univariate case we beginby taking

-1

' '

i i = Z Z Z Y

collecting the univariate least squares estimates yields

^

~

- - -1 -1

' ' ' '

1 2 m 1 2 m = | | | = ZZ Z Y | Y | | Y = ZZ Z Y

Now for any choice of parameters

- 1 2 m

B = b | b | | b

the resulting matrix of errors is

- ZY


6/45

The resultingError Sums of Squares and Crossproductsis

-

' '

1 1 1 1 1 1 m m

'

' '

m m 1 1 m m m m

Y - Zb Y - Zb Y - Zb Y - Zb

- ZB - ZB =

Y - Zb Y - Zb Y - Zb Y - Zb

Y Y

We can show that the selection b(i) =F(i) minimizes theith diagonal sum of squares

^

~~

'

i i i iY - Zb Y - Zb

i.e.,

- ' '

tr - ZB - ZB and - ZB - ZBY Y Y Y

are both minimized.

generalizedvariance


7/45

so we have matrices of predicted values

-1' '- Z = ZZ ZY Y

and we have a resulting matrices of residuals

-

-1' '

= - = I - Z ZZ ZY Y Y

Note that the orthogonality conditions amongresiduals, predicted values, and columns of the

design matrix which hold in the univariate case arealso true in the multivariate case because

- -1

' ' ' ' 'Z I - Z ZZ Z = Z - Z = 0


8/45

which means the residuals are perpendicular to thecolumns of the design matrix

and to the predicted values

-

-1' ' ' ' '

= Z I - Z ZZ Z = 0Y Y

Furthermore, because

= + Y Y

- -1' ' ' ' ' '

Z = Z I - Z ZZ Z = Z - Z = 0

we have

' ' '

= + YY YY

total sums of

squares and

crossproducts

predicted sums

ofsquares and

crossproducts

residual (error)

sums of squares

and crossproducts


9/45

Example suppose we had the following six sampleobservations on two independent variables (palatability

and texture) and two dependent variables (purchaseintent and overall quality):

Palatability TextureOverall

Quality

Purchase

Intent

65 71 63 67

72 77 70 70

77 73 72 70

68 78 75 72

81 76 89 88

73 87 76 77

Use these data to estimate the multivariate linearregression model for which palatability and texture areindependent variables while purchase intent andoverall quality are the dependent variables


10/45

We wish to estimate

Y1 = B01 + B11z1 + B21z2

and

Y2 = B02 + B12z1 + B22z2

jointly.

The design matrix is

-

1 65 71

1 72 771 77 73

Z =1 68 78

1 81 76

1 73 87


11/45

so

- -

-

'

1 65 71

1 72 771 1 1 1 1 1 6 436 462

1 77 73ZZ = 65 72 77 68 81 73 = 436 31852 33591

1 68 7871 77 73 78 76 87 462 33591 35728

1 81 76

1 73 87

-

-

-1

-1'

6 436 462

ZZ = 436 31852 33591462 33591 35728

62.560597030 -0.378268027 -0.453330568

= -0.378268027 0.005988412 -0.000738830

-0.453330568 -0.000738830 0.006584661

and


12/45

and

- - -

'

1

63

701 1 1 1 1 1 445

72Zy = 65 72 77 68 81 73 = 3253675

71 77 73 78 76 87 3434589

76

- -

-1' '

1 1 = ZZ Zy

62.560597030 -0.378268027 -0.453330568 445

= -0.378268027 0.005988412 -0.000738830 32536

-0.453330568 -0.000738830 0.006584661 34345

-37.501205460

= 1.134583728

0.

- 379499410

so


13/45

and

- - -

'

2

67

701 1 1 1 1 1 444

70Zy = 65 72 77 68 81 73 = 3243072

71 77 73 78 76 87 3426088

77

- -

-1' '

2 2 = ZZ Zy

62.560597030 -0.378268027 -0.453330568 444

= -0.378268027 0.005988412 -0.000738830 32430

-0.453330568 -0.000738830 0.006584661 34260

-21.432293350

= 0.940880634

0.

- 351449792

so


14/45

so

- -

1 2

-37.501205460 -21.432293350

= | = 1.134583728 0.940880634

0.379499410 0.351449792

This gives us estimated values matrix

- -

1 65 71 63.19119 64.67788

1 72 77 73.41028 73.37275-37.501205460 -21.432293350

1 77 73 77.56520 76.67135

= Z = 1.134583728 0.940880634 =1 68 78 69.20.379499410 0.351449792

1 81 76

1 73 87

Y

-

5144 69.96067

83.24203 81.48922

78.33986 77.82812


15/45

and residuals matrix

- -

63 67 63.19119 64.67788 0.191194960-2.322116943

70 70 73.41028 73.37275 3.

72 70 77.56520 76.67135 = - = - =

75 72 69.25144 69.96067

89 88 83.24203 81.4892276 77 78.33986 77.82812

Y Y

-

410277515 3.372746244

5.565198512 6.671350244

-5.748557985 -2.039326498

-5.757968347 -6.5107778452.339855345 0.828124797

Note that each

column sums to zero!


16/45

B. Inference in Multivariate Regression

The least squares estimators

F = [F(1) |F(2) ||F(m)]

of the multivariate regression model have thefollowing properties

-

-

-

if the model is of full rank, i.e., rank(Z)= r + 1 < n.

Note that I and F are also uncorrelated.

- -

i i

E = i.e., E =

-1'

iki kCov , = ZZ , i,k = 1, , m

' -11E = 0 and E =

n - r - 1

~ ~ ~ ~

~ ~

~


17/45

This means that, for any observation z0

- -

' ' ' ' '

0 0 0 0 01 2 m 1 2 m z = z | | | = z | z | | z is an unbiased estimator, i.e.,

~

-

' '

0 0E z = z

We can also determine from these properties that theestimation errors

' '

0 0i iz - z

have covariances

-

- -

''

0 0i i i i

' -1' ' '

0 0 ik 0 0i i i i

E z - - z

= z E - - z = z ZZ z


18/45

Furthermore, we can easily ascertain that

'

0 0z = Y

i.e., the forecasted vector Y0 associated with the valuesof the predictor variables z0 is an unbiased estimatorof Y0.

The forecast errors have covariance

- -1

' ' ' '

0i 0 0k 0 ik 0 0i kE Y - z Y - z = 1 + z ZZ z

~~

^

~


19/45

Thus, for the multivariate regression model with full

rank (Z) = r + 1, nu

r + 1 + m, and normallydistributed errors I,

-1' '

= ZZ Z Y

is the maximum likelihood estimator ofF and

~~

~

~ N ,

where the elements of 7 are~

-1'

iki kCov , = ZZ , i,k = 1, , m


20/45

Also, the maximum likelihood estimator ofF isindependent of the maximum likelihood estimator of

the positive definite matrix7

given by

'

'1 1 = = - Z - Z

n nY Y

and

all of which provide additional support for using the

least squares estimate when the errors are normallydistributed

~

-1 'and n

~

^

p,n-r-1n ~ W

are the maximum likelihood estimators of

and


21/45

These results can be used to develop likelihood ratiotests for the multivariate regression parameters.

The hypothesis that the responses do not depend onpredictor variables zq+1, zq+2,, zr is

-

1

0 2

2

H : = 0 where =

~

(q + 1) x m

(r - q) x m

I

f we partition Z in a similar manner

-

1 2Z = Z | Z

m x (q + 1) m x (r - q)

Big Bet

a (2)


22/45

we can write the general model as

- - -

1

1 2 1 21 2

2

E = Z = Z | Z = Z + Z

Y

The extra sum of squares associated withF(2) are~

' '

1 1 11 1- Z - Z - - Z - Z = n - Y Y Y Y

-1' '

1 1 11= Z Z Z Y

'

-1

1 1 11 1 = n-Z

-ZY Y

where

and

^


23/45

The likelihood ratio for the test of the hypothesis

H0:F(2) = 0

is given by the ratio of generalized variances

1

n 2

1 11,

1,

max L , L , = = =

max L , L ,

~

which is often converted to Wilks Lambda statistic

~

2 n

1

=


24/45

Finally, for the multivariate regression model withfull rank (Z) = r + 1, n u r + 1 + m, normally

distributed errors I, and the null hypothesis is true(so n(71 7) ~ Wq,r-q(7))

-

2

m r-q

1

1- n - r - 1 - m- r + q + 1 ln ~

2

~

when n r and n m are both large.

~

~~ ~

^ ^


25/45

If we again refer to theError Sum of Squares andCrossproducts as

E = n7

and theHypothesis Sum of Squares and Crossproductsas

H = n(71 - 7)

then we can define Wilks lambda as

~

s

2 n

i=1 i1

E 1 = = =E + H 1 +

~^

~ ~

where L1 u L2 u u Ls are the ordered eigienvalues ofHE-1 where s = min(p, r - q).~ ~

~


26/45

1

1

1 +

There are other similar tests (as we have seen in ourdiscussion of MANOVA):

- s

-1i

i=1 i

= tr H H + E

1 +

Each of these statistics is an alternative to Wilkslambda and perform in a very similar manner(particularly for large sample sizes).

Pillais Trace

Hotelling-Lawley Trace

Roys Greatest Root

- s

-1

ii=1 = tr HE


27/45

Example For our previous data (the following sixsample observations on two independent variables -

palatability and texture - and two dependent variables -purchase intent and overall quality

Palatability TextureOverall

Quality

Purchase

Intent

65 71 63 67

72 77 70 70

77 73 72 70

68 78 75 72

81 76 89 88

73 87 76 77

to test the hypotheses that i) palatability has no jointrelationship with purchase intent and overall qualityand ii) texture has no joint relationship with purchaseintent and overall quality.


28/45

We first test the hypothesis that palatability has nojoint relationship with purchase intent and overall

quality, i.e.,H0:F(1) = 0

The likelihood ratio for the test of this hypothesis isgiven by the ratio of generalized variances

2

n 2

2 22,

2

,

max L , L , = = =

max L , L ,

For ease of computation, well use the Wilks lambdastatistic

2 n

2

E = =

E + H

~


29/45

The error sum of squares and crossproducts matrix is

-

114.31302415 99.335143683E =

99.335143683 108.5094298

and the hypothesis sum of squares and crossproductsmatrix for this null hypothesis is

-

214.96186763 178.26225891H =

178.26225891 147.82823253


30/45

so the calculated value of the Wilks lambda statistic is

-

- -

2 nE

=E + H

114.31302415 99.335143683

99.335143683 108.5094298=

114.31302415 99.335143683 214.96186763 178.26225891+

99.335143683 108.5094298 178.26225891 147.82823253

2536.570299=

7345.2380= 0.34533534

98


31/45

The transformation to a Chi-square distributedstatistic (which is actually valid only when n r and n

m are both large) is

-

-

1

1- n - r - 1 - m- r + q + 1 ln

2

1= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.34533534

2

= 0.92351795

at E = 0.01 and m(r - q) = 1 degrees of freedom, thecritical value is 9.210351 - we have a strong non-rejection. Also, the approximate p-value of this chi-square test is 0.630174 note that this is an extremely

gross approximation (since n r = 4 and n m = 4).


32/45

We next test the hypothesis that texture has no jointrelationship with purchase intent and overall quality,

i.e.,H0:F(2) = 0

The likelihood ratio for the test of this hypothesis isgiven by the ratio of generalized variances

1

n 2

1 11,

1

,

max L , L , = = =

max L , L ,

For ease of computation, well use the Wilks lambdastatistic

2 n

1

E = =

E + H

~


33/45

The error sum of squares and crossproducts matrix is

-

114.31302415 99.335143683E =

99.335143683 108.5094298

and the hypothesis sum of squares and crossproductsmatrix for this null hypothesis is

-

21.872015222 20.255407498H =

20.255407498 18.758286731


34/45

so the calculated value of the Wilks lambda statistic is

-

- -

2 nE

=E + H

114.31302415 99.335143683

99.335143683 108.5094298=

114.31302415 99.335143683 21.872015222 20.255407498+

99.335143683 108.5094298 20.255407498 18.758286731

2536.570299=

3030.0590= 0.837135598

55


35/45

The transformation to a Chi-square distributedstatistic (which is actually valid only when n r and n

m are both large) is

-

-

1

1- n - r - 1 - m- r + q + 1 ln

2

1= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.837135598

2

= 0.15440838

at E = 0.01 and m(r - q) = 1 degrees of freedom, thecritical value is 9.210351 - we have a strong non-rejection. Also, the approximate p-value of this chi-square test is 0.925701 - note that this is an extremely

gross approximation (since n r = 4 and n m = 4).


36/45

OPTIONS LINESIZE = 72 NODATE PAGENO = 1;

DATA stuff;

INPUT z1 z2 y1 y2;

LABEL z1='Palatability Rating'z2='Texture Rating'

y1='Overall Quality Rating'

y2='Purchase Intent';

CARDS;

65 71 63 67

72 77 70 70

77 73 72 70

68 78 75 72

81 76 89 88

73 87 76 77

;

PROC GLMDATA=stuff;

MODEL y1 y2 = z1 z2/;

MANOVA H=z1 z2/PRINTE PRINTH;

TITLE4 'Using PROCGLM for Multivariate Linear Regression';

RUN;

SAS code for a Multivariate Linear Regression Analysis:


37/45

Dependent Variable: y1 Overall Quality Rating

SumofSource DF Squares Mean Square F Value Pr >F

Model 2 256.5203092 128.2601546 3.37 0.1711

Error 3 114.3130241 38.1043414

CorrectedTotal 5 370.8333333

R-Square Coeff Var Root MSE y1 Mean

0.691740 8.322973 6.172871 74.16667

Source DF Type I SS Mean Square F Value Pr >F

z1 1 234.6482940 234.6482940 6.16 0.0891

z2 1 21.8720152 21.8720152 0.57 0.5037

Source DF Type III SS Mean Square F Value Pr >F

z1 1 214.9618676 214.9618676 5.64 0.0980

z2 1 21.8720152 21.8720152 0.57 0.5037

Dependent Variable: y1 Overall Quality Rating

Standard

Parameter Esti mate Error t Value Pr > |t|

Intercept -37.50120546 48.82448511 -0.77 0.4984

z1 1.13458373 0.47768661 2.38 0.0980

z2 0.37949941 0.50090335 0.76 0.5037

SAS output for a Multivariate Linear Regression Analysis:


38/45

Dependent Variable: y2 Purchase Intent

SumofSource DF Squares Mean Square F Value Pr >F

Model 2 181.4905702 90.7452851 2.51 0.2289

Error 3 108.5094298 36.1698099

CorrectedTotal 5 290.0000000

R-Square Coeff Var Root MSE y2 Mean

0.625830 8.127208 6.014134 74.00000

Source DF Type I SS Mean Square F Value Pr >F

z1 1 162.7322835 162.7322835 4.50 0.1241

z2 1 18.7582867 18.7582867 0.52 0.5235

Source DF Type III SS Mean Square F Value Pr >F

z1 1 147.8282325 147.8282325 4.09 0.1364

z2 1 18.7582867 18.7582867 0.52 0.5235

Dependent Variable: y2 Purchase Intent

Standard

Parameter Esti mate Error t Value Pr > |t|

Intercept -21.43229335 47.56894895 -0.45 0.6829

z1 0.94088063 0.46540276 2.02 0.1364

z2 0.35144979 0.48802247 0.72 0.5235



39/45

The GLMProcedure

Multivariate Analysisof Variance

E = Error SSCP Matrix

y1 y2

y1 114.31302415 99.335143683

y2 99.335143683 108.5094298

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|

DF = 3 y1 y2

y1 1.000000 0.891911

0.1081

y2 0.891911 1.000000

0.1081



40/45

The GLMProcedure


H = Type III SSCP Matrix for z1

y1 y2

y1 214.96186763 178.26225891

y2 178.26225891 147.82823253

Characteristic Roots and Vectorsof: E Inverse * H, where


E = Error SSCP MatrixCharacteristic Characteristic Vector V'EV=1

Root Percent y1 y2

1.89573606 100.00 0.10970859 -0.01905206

0.00000000 0.00 -0.17533407 0.21143084

MANOVATest Criteria and Exact FStatistics

for the Hypothesisof NoOverall z1 Effect

H = Type III SSCP Matrix for z1E = Error SSCP Matrix

S=1 M=0 N=0

Statistic Value F Value NumDF Den DF Pr >F

Wilks' Lambda 0.34533534 1.90 2 2 0.3453

Pillai'sTrace 0.65466466 1.90 2 2 0.3453

Hotelling-Lawley Trace 1.89573606 1.90 2 2 0.3453

Roy'sGreatest Root 1.89573606 1.90 2 2 0.3453



41/45

The GLMProcedure



y1 y2

y1 21.872015222 20.255407498

y2 20.255407498 18.758286731

Characteristic Roots and Vectorsof: E Inverse * H, where


E = Error SSCP MatrixCharacteristic Characteristic Vector V'EV=1

Root Percent y1 y2

0.19454961 100.00 0.06903935 0.02729059

0.00000000 0.00 -0.19496558 0.21052601

MANOVATest Criteria and Exact FStatistics

for the Hypothesisof NoOverall z2 Effect

H = Type III SSCP Matrix for z2E = Error SSCP Matrix

S=1 M=0 N=0

Statistic Value F Value NumDF Den DF Pr >F

Wilks' Lambda 0.83713560 0.19 2 2 0.8371

Pillai'sTrace 0.16286440 0.19 2 2 0.8371

Hotelling-Lawley Trace 0.19454961 0.19 2 2 0.8371

Roy'sGreatest Root 0.19454961 0.19 2 2 0.8371



42/45

We can also build confidence intervals for the

predicted mean value of Y0 associated with z0 - if the

model

and

= Z + Y

-1' '0 m 0 0 0

' 'z ~ N z ,z ZZ z

has normal errors, then

~ ~

n-r-1n ~ W

independent

so

'

-10 0 0 0

2

-1 -1' ' ' '

0 0 0 0

' ' ' 'z - z z - zn

T = n - r - 1

z ZZ z z ZZ z


43/45

e

-1'-1

' '

0 0 0 0 0 0 m,n-r-m

m n - r - 1n' ' ' 'z - z z - z z ZZ z F

n - r - 1 n - r - m

Thus the 100(1 E)% confidence interval for the

predicted mean value of Y0 associated with z0 (Fz0)

is given by~ ~

and the 100(1 E)% simultaneous confidence intervals

for the mean value of Yi associated with z0 (z0 F(i) ) are

~ ~

~

~~

s -1' ' '

0 m,n-r-m 0 0 iiim n - r - 1 nz F z ZZ z

n - r -m n - r - 1

i = 1,,m

~


44/45

Finally, we can build prediction intervals for thepredicted value of Y0 associated with z0 here the

prediction error

and

= Z + Y

-1' '0 m 0 0 0

' 'z ~ N z ,z ZZ z

has normal errors, then

~ ~

n-r-1n ~ W

independent

so

'

-10 0 0 0

2

-1 -1' ' ' '

0 0 0 0

' ' ' 'z - z z - zn

T = n - r - 1

z ZZ z z ZZ z


45/45

e

-1'-1

' '

0 0 0 0 0 0 m,n-r-m

m n - r - 1n' 'Y - z Y - z 1 + z ZZ z F

n - r - 1 n - r - m

the prediction intervals the 100(1 E)% predictioninterval associated with z0 is given by

and the 100(1 E)% simultaneous prediction intervalswith z0 are

~

~

s

-1

' ' '0 m,n-r-m 0 0 iii

m n - r - 1nz F 1 + z ZZ z

n - r -m n - r - 1

i = 1,,m

multivariate linear regression

Documents