linear regression models in matrix terms

51
Linear regression models in matrix terms

Upload: florence-vaughn

Post on 03-Jan-2016

29 views

Category:

Documents


4 download

DESCRIPTION

Linear regression models in matrix terms. The regression function in matrix terms. for i = 1,…, n. Simple linear regression function. Simple linear regression function in matrix notation. Definition of a matrix. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linear regression models in matrix terms

Linear regression modelsin matrix terms

Page 2: Linear regression models in matrix terms

The regression function in matrix terms

Page 3: Linear regression models in matrix terms

Simple linear regression function

iii xY 10 for i = 1,…, n

nnn xY

xY

xY

10

22102

11101

Page 4: Linear regression models in matrix terms

Simple linear regression function in matrix notation

XY

nnn x

x

x

Y

Y

Y

2

1

1

02

1

2

1

1

1

1

Page 5: Linear regression models in matrix terms

Definition of a matrixAn r×c matrix is a rectangular array of symbols or numbers arranged in r rows and c columns.

A matrix is almost always denoted by a single capital letter in boldface type.

36

21A

9.1401

8.2711

5.2651

1.3921

4.3801

B

6261

5251

4241

3231

2221

1211

1

1

1

1

1

1

xx

xx

xx

xx

xx

xx

X

Page 6: Linear regression models in matrix terms

Definition of a vector and a scalarA column vector is an r×1 matrix, that is, a matrix with only one column.

8

5

2

q

A row vector is an 1×c matrix, that is, a matrix with only one row.

90324621h

A 1×1 “matrix” is called a scalar, but it’s just an ordinary number, such as 29 or σ2.

Page 7: Linear regression models in matrix terms

Matrix multiplication

XY• The Xβ in the regression function is an example of

matrix multiplication.• Two matrices can be multiplied together:

– Only if the number of columns of the first matrix equals the number of rows of the second matrix.

– The number of rows of the resulting matrix equals the number of rows of the first matrix.

– The number of columns of the resulting matrix equals the number of columns of the second matrix.

Page 8: Linear regression models in matrix terms

Matrix multiplication

• If A is a 2×3 matrix and B is a 3×5 matrix then matrix multiplication AB is possible. The resulting matrix C = AB has … rows and … columns.

• Is the matrix multiplication BA possible?

• If X is an n×p matrix and β is a p×1 column vector, then Xβ is …

Page 9: Linear regression models in matrix terms

Matrix multiplication

59273841

8810610190

8696

3745

5123

218

791ABC

The entry in the ith row and jth column of C is the inner product (element-by-element products added together) of the ith row of A with the jth column of B.

101)9(7)4(9)2(1

90)6(7)5(9)3(1

12

11

c

c

24

23 27)6(2)7(1)1(8

c

c

Page 10: Linear regression models in matrix terms

The Xβ multiplication in simple linear regression setting

1

02

1

1

1

1

nx

x

x

X

Page 11: Linear regression models in matrix terms

Matrix addition

XY• The Xβ+ε in the regression function is an example

of matrix addition.• Simply add the corresponding elements of the two

matrices.– For example, add the entry in the first row, first column

of the first matrix with the entry in the first row, first column of the second matrix, and so on.

• Two matrices can be added together only if they have the same number of rows and columns.

Page 12: Linear regression models in matrix terms

Matrix addition

1465

8510

199

812

139

257

653

781

142

BAC

23

12

11

954

972

c

c

c

For example:

Page 13: Linear regression models in matrix terms

The Xβ+ε addition in the simple linear regression setting

nnn x

x

x

X

Y

Y

Y

Y

2

1

10

210

110

2

1

Page 14: Linear regression models in matrix terms

Multiple linear regression functionin matrix notation

XY

nnnnn xxx

xxx

xxx

Y

Y

Y

2

1

321

232221

131211

2

1

1

1

1

Page 15: Linear regression models in matrix terms

Least squares estimates of the parameters

Page 16: Linear regression models in matrix terms

Least squares estimates

YXXX

b

b

b

b

p

1

1

1

0

The p×1 vector containing the estimates of the p parameters can be shown to equal:

where (X'X)-1 is the inverse of the X'X matrix and X' is the transpose of the X matrix.

Page 17: Linear regression models in matrix terms

Definition of the transpose of a matrix

The transpose of a matrix A is a matrix, denoted A' or AT, whose rows are the columns of A and whose columns are the rows of A … all in the same original order.

97

84

51

A

TAA

Page 18: Linear regression models in matrix terms

The X'X matrix in the simple linear regression setting

n

n

x

x

x

xxxXX

1

1

1

111 2

1

21

Page 19: Linear regression models in matrix terms

Definition of the identity matrixThe (square) n×n identity matrix, denoted In, is a matrix with 1’s on the diagonal and 0’s elsewhere.

10

012I

The identity matrix plays the same role as the number 1 in ordinary arithmetic.

10

01

64

79

Page 20: Linear regression models in matrix terms

Definition of the inverse of a matrix

The inverse A-1 of a square (!!) matrix A is the unique matrix such that …

11 AAIAA

Page 21: Linear regression models in matrix terms

Least squares estimates in simple linear regression setting

?1

1

0

YXXXb

bb

n

n

x

x

x

xxxXX

1

1

1

111 2

1

21

soap suds so*su soap2

4.0 33 132.0 16.004.5 42 189.0 20.255.0 45 225.0 25.005.5 51 280.5 30.256.0 53 318.0 36.006.5 61 396.5 42.257.0 62 434.0 49.00--- --- ----- -----38.5 347 1975.0 218.75

ix iy ii yx 2ix

Find X'X.

Page 22: Linear regression models in matrix terms

Least squares estimates in simple linear regression setting

It’s very messy to determine inverses by hand. We let computers find inverses for us.

14286.078571.0

78571.04643.4

75.2185.38

5.3871XXXX

14286.078571.0

78571.04643.4

75.2185.38

5.3871

1XX

Find inverse of X'X.

Therefore:

Page 23: Linear regression models in matrix terms

Least squares estimates in simple linear regression setting

?1

1

0

YXXXb

bb

n

n

y

y

y

xxxYX

2

1

21

111soap suds so*su soap2

4.0 33 132.0 16.004.5 42 189.0 20.255.0 45 225.0 25.005.5 51 280.5 30.256.0 53 318.0 36.006.5 61 396.5 42.257.0 62 434.0 49.00--- --- ----- -----38.5 347 1975.0 218.75

ix iy ii yx 2ix

Find X'Y.

Page 24: Linear regression models in matrix terms

Least squares estimates in simple linear regression setting

1975

347

14286.078571.0

78571.04643.41 YXXXb

51.9

67.2

)1975(14286.0)347(78571.0

)1975(78571.0)347(4643.4

1

0

b

bb

The regression equation issuds = - 2.68 + 9.50 soap

Page 25: Linear regression models in matrix terms

Linear dependence

The columns of the matrix:

31263

6812

1421

A

are linearly dependent, since (at least) one of the columns can be written as a linear combination of another.

If none of the columns can be written as a linear combination of another, then we say the columns are linearly independent.

Page 26: Linear regression models in matrix terms

Linear dependence is not always obvious

123

132

141

A

Formally, the columns a1, a2, …, an of an n×n matrix are linearly dependent if there are constants c1, c2, …, cn, not all 0, such that:

02211 nnacacac

Page 27: Linear regression models in matrix terms

Implications of linear dependence on regression

• The inverse of a square matrix exists only if the columns are linearly independent.

• Since the regression estimate b depends on (X'X)-1, the parameter estimates b0, b1, …, cannot be (uniquely) determined if some of the columns of X are linearly dependent.

Page 28: Linear regression models in matrix terms

The main point about linear dependence

• If the columns of the X matrix (that is, if two or more of your predictor variables) are linearly dependent (or nearly so), you will run into trouble when trying to estimate the regression function.

Page 29: Linear regression models in matrix terms

Implications of linear dependenceon regressionsoap1 soap2 suds4.0 8 334.5 9 425.0 10 455.5 11 516.0 12 536.5 13 617.0 14 62

* soap2 is highly correlated with other X variables* soap2 has been removed from the equation

The regression equation issuds = - 2.68 + 9.50 soap1

Page 30: Linear regression models in matrix terms

Fitted values and residuals

Page 31: Linear regression models in matrix terms

Fitted values

nn xbb

xbb

xbb

y

y

y

y

10

210

110

2

1

ˆ

ˆ

ˆ

ˆ

Page 32: Linear regression models in matrix terms

Fitted values

yXXXXXby 1ˆ

The vector of fitted values

is sometimes represented as a function of the hat matrix H

XXXXH 1

That is:

HyyXXXXy 1ˆ

Page 33: Linear regression models in matrix terms

The residual vector

iii yye ˆ for i = 1,…, n

nnn yye

yye

yye

ˆ

ˆ

ˆ

222

111

nnn yy

yy

yy

e

e

e

e

ˆ

ˆ

ˆ

22

11

2

1

Page 34: Linear regression models in matrix terms

The residual vector written as a function of the hat matrix

nnn yy

yy

yy

e

e

e

e

ˆ

ˆ

ˆ

22

11

2

1

Page 35: Linear regression models in matrix terms

Sum of squares and the analysis of variance table

Page 36: Linear regression models in matrix terms

Analysis of variance table in matrix terms

Source DF SS MS F

Regression p-1

Error n-p

Total n-1 JYYn

YYSSTO

1

JYYn

YXbSSR

1

YXbYYSSE

1pSSR

pn

SSE

MSE

MSR

Page 37: Linear regression models in matrix terms

Sum of squares

In general, if you pre-multiply a vector by its transpose, you get a sum of squares.

n

iin

n

n yyyy

y

y

y

yyyyy1

2222

21

2

1

21

Page 38: Linear regression models in matrix terms

Error sum of squares

2

1

ˆn

iii yySSE

Page 39: Linear regression models in matrix terms

Error sum of squares

yyyySSE ˆˆ '

Page 40: Linear regression models in matrix terms

Total sum of squares

n

yyyySSTO iii

2

22

Previously, we’d write:

JYYn

YYSSTO

1

But, it can be shown that equivalently:

where J is a (square) n×n matrix containing all 1’s.

Page 41: Linear regression models in matrix terms

An example oftotal sum of squares

If n = 2:

2221

21

221

22

1

2 YYYYYYYi

i

But, note that we get the same answer by:

2221

21

2

121 2

11

11YYYY

Y

YYYJYY

Page 42: Linear regression models in matrix terms

Analysis of variance table in matrix terms

Source DF SS MS F

Regression p-1

Error n-p

Total n-1 JYYn

YYSSTO

1

JYYn

YXbSSR

1

YXbYYSSE

1pSSR

pn

SSE

MSE

MSR

Page 43: Linear regression models in matrix terms

Model assumptions

Page 44: Linear regression models in matrix terms

Error term assumptions

• As always, the error terms εi are:

– independent– normally distributed (with mean 0)– with equal variances σ2

• Now, how can we say the same thing using matrices and vectors?

Page 45: Linear regression models in matrix terms

Error terms as a random vector

The n×1 random error term vector, denoted as ε, is:

n

2

1

Page 46: Linear regression models in matrix terms

The mean (expectation) of the random error term vector

The n×1 mean error term vector, denoted as E(ε), is:

0

0

0

0

2

1

nE

E

E

E

Definition Assumption Definition

Page 47: Linear regression models in matrix terms

The variance of the random error term vector

The n×n variance matrix, denoted as σ2(ε), is defined as:

nnn

n

n

n

221

222

21

12112

2

1

22

),(),(

),(),(

),(),(

Diagonal elements are variances of the errors.Off-diagonal elements are covariances between errors.

Page 48: Linear regression models in matrix terms

The ASSUMED variance of the random error term vector

BUT, we assume error terms are independent (covariances are 0), and have equal variances (σ2).

2

2

2

2

1

22

00

00

00

n

Page 49: Linear regression models in matrix terms

Scalar by matrix multiplication

Just multiply each element of the matrix by the scalar.

462

101214

082

231

567

041

2

For example:

Page 50: Linear regression models in matrix terms

The ASSUMED variance of the random error term vector

2

2

2

2

00

00

00

Page 51: Linear regression models in matrix terms

The general linear regression model

Putting the regression function and assumptions all together, we get:

XYwhere:

• Y is a ( ) vector of response values

• β is a ( ) vector of unknown parameters

• X is an ( ) matrix of predictor values

• ε is an ( ) vector of independent, normal error terms with mean 0 and (equal) variance σ2I.