matrix approach to simple linear regressionusers.stat.ufl.edu/~winner/sta6208/matrix_reg.pdf ·...
TRANSCRIPT
Introduction to Matrices and Matrix Approach to Simple
Linear Regression
Matrices • Definition: A matrix is a rectangular array of numbers
or symbolic elements
• In many applications, the rows of a matrix will represent individuals cases (people, items, plants, animals,...) and columns will represent attributes or characteristics
• The dimension of a matrix is it number of rows and columns, often denoted as r x c (r rows by c columns)
• Can be represented in full form or abbreviated form:
11 12 1 1
21 22 2 2
1 2
1 2
1,..., ; 1,...,
j c
j c
ij
i i ij ic
r r rj rc
a a a a
a a a a
a i r j ca a a a
a a a a
A
Special Types of Matrices
11 12
21 22
1
2
3
4
Square Matrix: Number of rows = # of Columns
20 32 50
12 28 42
28 46 60
Vector: Matrix with one column (column vector) or one row (row vector)
57
24
18
r c
b b
b b
d
d
d
d
A B
C D 1 2 3
2 3 3 2
11 1
1
' 17 31 '
Transpose: Matrix formed by interchanging rows and columns of a matrix (use "prime" to denote transpose)
6 86 15 22
' 15 138 13 25
22 25
c
r c
r rc
f f f
h h
h h
E F
G G
H
11 1
1
1,..., ; 1,..., ' 1,..., ; 1,...,
Matrix Equality: Matrices of the same dimension, and corresponding elements in same cells are all equal:
r
ij jic r
c rc
h h
h i r j c h j c i r
h h
H
A11 12
11 12 21 22
21 22
4 6 = 4, 6, 12, 10
12 10
b bb b b b
b b
B
Regression Examples - Toluca Data
1
2
1
1 21
1
2
2
21 2
Response Vector:
'
1
1Design Matrix:
1
1 1 1'
n
n
nn
n
n
nn
Y
Y
Y
Y Y Y
X
X
X
X X X
Y
Y
X
X
X Y
1 80 399
1 30 121
1 50 221
1 90 376
1 70 361
1 60 224
1 120 546
1 80 352
1 100 353
1 50 157
1 40 160
1 70 252
1 90 389
1 20 113
1 110 435
1 100 420
1 30 212
1 50 268
1 90 377
1 110 421
1 30 273
1 90 468
1 40 244
1 80 342
1 70 323
Matrix Addition and Subtraction
2 22 2 2 2 2 2
11 1
1
Addition and Subtraction of 2 Matrices of Common Dimension:
4 7 2 0 4 2 7 0 6 7 4 2 7 0 2 7
10 12 14 6 10 14 12 6 24 18 10 14 12 6 4 6
c
r c
r rc
a a
a a
C D C D C D
A
11 1
1
11 11 1 1
1 1
11 11 1 1
1 1
1,..., ; 1,..., 1,..., ; 1,...,
1,..., ; 1,...,
c
ij ijr c
r rc
c c
ij ijr c
r r rc rc
c c
r c
r r
b b
a i r j c b i r j c
b b
a b a b
a b i r j c
a b a b
a b a b
a b
B
A B
A B
1 1 1 1 1
2 2 2 2 2
1 11 11 1
1,..., ; 1,...,
Regression Example:
since
ij ij
rc rc
n nn nn n
n n n n
a b i r j c
a b
Y E Y Y E Y
Y E Y Y E Y
Y E Y Y E Y
Y E Y ε Y E Y ε
1 1 1
2 2 2
n n n n
E Y
E Y
E Y
Matrix Multiplication Multiplication of a Matrix by a Scalar (single number):
2 1 3(2) 3(1) 6 33
2 7 3( 2) 3(7) 6 21
Multiplication of a Matrix by a Matrix (#cols( ) = #rows( )):
If :A A B B A
A Br c r c r
k k
c r
A A
A B
A B AB
th th
3 2 2 2
3 2 2 2 3 2
= 1,..., ; 1,...,
sum of the products of the elements of i row of and j column of :
2 53 1
3 12 4
0 7
2(3) 5(2) 2( 1) 5(4)
3(3) ( 1
B
ij A Bc
ij A B
ab i r j c
ab c r
A B
A B
A B AB
1
16 18
)(2) 3( 1) ( 1)(4) 7 7
0(3) 7(2) 0( 1) 7(4) 14 8
If : = = 1,..., ; 1,...,A A B B A B
c
A B ij ik kj A Br c r c r c
k
c r c ab a b i r j c
A B AB
Matrix Multiplication Examples - I
11 1 12 2 1 21 1 22 2 2
11 12 1 1
1 2
21 22 2 2
11 1 12 2 1
21 1 22 2 2
22 2
Simultaneous Equations:
(2 equations: , unknown):
4
Sum of Squares: 4 2 3 4 2 3 2
3
a x a x y a x a x y
a a x yx x
a a x y
a x a x y
a x a x y
AX = Y
1 0 1 1
2 0 1 20
1
0 1
29
1
1Regression Equation (Expected Values):
1 n n
X X
X X
X X
Matrix Multiplication Examples - II
1
2 2
1 2
1
1
12
1 2 2
1 1
Matrices used in simple linear regression (that generalize to multiple regression):
'
1
1 1 1 1'
1
n
n i
i
n
n
i
i
n nn
i i
i in
Y
YY Y Y Y
Y
Xn X
X
X X XX X
X
Y Y
X X
1 1 0 1 1
12 2 0 1 20
1 2 1
1 0 1
1
1 1 1 1'
1
n
i
i
nn
i i
in n n
Y X XY
Y X X
X X XX Y
Y X X
X Y Xβ
Special Matrix Types Symmetric Matrix: Square matrix with a transpose equal to itself: ':
6 19 8 6 19 8
19 14 3 19 14 3
8 3 1 8 3 1
Diagonal Matrix: Square matrix with all off-diagonal elements equal to 0:
A = A
A A' A
A
1
2
3
4 0 0 0 0
0 1 0 0 0 Note:Diagonal matrices are symmetric (not vice versa)
0 0 2 0 0
Identity Matrix: Diagonal matrix with all diagonal elements equal to 1 (acts like multiplying a
b
b
b
B
11 12 13 11 12 13
21 22 23 21 22 233 3 3 3
31 32 33 31 32 33
scalar by 1):
1 0 0
0 1 0
0 0 1
Scalar Matrix: Diagonal matrix with all diagonal elements equal to a single
a a a a a a
a a a a a a
a a a a a a
I A IA AI A
4 4
1 1 11
number"
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 0 1
1-Vector and matrix and zero-vector:
1 01 1
1 0 Note: ' 1 1 1
1 11 0
r r rr r r
k
kk k
k
k
I
1 J 0 1 1 11
1 11 1
1 1' 1 1 1
1 11 1
r r r rr
1 1 J
Linear Dependence and Rank of a Matrix
• Linear Dependence: When a linear function of the columns (rows) of a matrix produces a zero vector (one or more columns (rows) can be written as linear function of the other columns (rows))
• Rank of a matrix: Number of linearly independent columns (rows) of the matrix. Rank cannot exceed the minimum of the number of rows or columns of the matrix. rank(A) ≤ min(rA,ca)
• A matrix if full rank if rank(A) = min(rA,ca)
1 2 1 22 2 2 1 2 1
1 2 1 22 2 2 1 2 1
1 33 Columns of are linearly dependent rank( ) = 1
4 12
4 30 0 Columns of are linearly independent rank( ) = 2
4 12
A A A A A 0 A A
B B B B B 0 B B
Matrix Inverse
• Note: For scalars (except 0), when we multiply a number, by its reciprocal, we get 1: 2(1/2)=1 x(1/x)=x(x-1)=1
• In matrix form if A is a square matrix and full rank (all rows and columns are linearly independent), then A has an inverse: A-1 such that: A-1 A = A A-1 = I
2 8 2 8 4 32 16 16
2 8 2 8 1 036 36 36 36 36 36 36 36
4 2 4 2 4 2 4 2 8 8 32 4 0 1
36 36 36 36 36 36 36 36
4 0 0 1/ 4 0 0 4 1
0 2 0 0 1/ 2 0
0 0 6 0 0 1/ 6
-1 -1
-1 -1
A A A A I
B B BB
/ 4 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 2 1/ 2 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 6 1/ 6 0 0 1
I
Computing an Inverse of 2x2 Matrix 11 12
2 221 22
11 22 12 21
11 12 21 22
11 22 12 21 12 22 12
full rank (columns/rows are linearly independent)
Determinant of
Note: If A is not full rank (for some value ):
a a
a a
a a a a
k a ka a ka
a a a a ka a a k
A
A A
A 22
22 121
2 221 11
1
2
0
1 Thus does not exist if is not full rank
While there are rules for general matrices, we will use computers to solve them
Regression Example:
1
1
1 n
a
a a
a a
r r
X
X
X
-1A A A
A
X
2
22
1 12 2
1 1 1 12
1 1
2
1 1 1
2
1 1
1 Note:
n n
i in n n ni i
i i i in ni i i i
i i
i i
n n
i i
i i
in ni
i ii i
n X X
n X X n X n X Xn
X X
X X
X
n X X X n
X'X X'X
X'X
2 22 22 2
1 1 1 1 1
2
2 2
1 1 1
2 2
1 1
1
1
n n n n n
i i i i
i i i i
n n
i i
i i
n n
i i
i i
nX X X X nX X X X nX
X X
nX X X X
X
X X X X
X'X
Use of Inverse Matrix – Solving Simultaneous Equations
1 2 1 2
1
where and are matrices of of constants, is matrix of unknowns
(assuming is square and full rank)
Equation 1: 12 6 48 Equation 2: 10 2 12
12 6
10 2
y y y y
y
y
-1 -1 -1
AY = C A C Y
A AY A C Y = A C A
A Y2
48
12
2 6 2 61 1
10 12 10 1212( 2) 6(10) 84
2 6 48 96 72 168 21 1 1
10 12 12 480 144 336 484 84 84
Note the wisdom of waiting to divide by a
-1
-1
-1
C Y = A C
A
Y = A C
| A | t end of calculation!
Useful Matrix Results All rules assume that the matrices are conformable to operations:
Addition Rules:
( ) ( )
Multiplication Rules:
( ) ( ) scalar
Transpose Rules:
( ') ' ( ) ' '
k k k k
A B B A A B C A B C
(AB)C A(BC) C A B CA + CB A B A B
A A A B A ' ( ) '
Inverse Rules (Full Rank, Square Matrices):
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
B AB B'A' (ABC)' = C'B'A'
(AB) = B A (ABC) = C B A (A ) = A (A') = (A )'
Random Vectors and Matrices
1
1 2 3 2
3
1
2
3
Shown for case of =3, generalizes to any :
Random variables: , ,
Expectation: In general: 1,..., ; 1,...,
Variance-Covariance Matri
ij
n p
n n
Y
Y Y Y Y
Y
E Y
E Y E Y i n j p
E Y
Y
E Y E Y
1 1
2
2 2 1 1 2 2 3 3
3 3
2
1 1 1 1 2 2 1 1 3 3
2
2 2 1 1 2 2 2 2 3 3
3 3 1 1 3 3 2 2 3
x for a Random Vector:
Y E Y
E Y E Y Y E Y Y E Y Y E Y
Y E Y
Y E Y Y E Y Y E Y Y E Y Y E Y
Y E Y Y E Y Y E Y Y E Y Y E Y
Y E Y Y E Y Y E Y Y E Y Y E Y
Y Y E Y Y E Y ' E
E
2
1 12 13
2
21 2 23
22
31 32 33
Linear Regression Example (n=3)
2
2 2
2
1
2 2
2
2
3
Error terms are assumed to be independent, with mean 0, constant variance :
0 , 0
0 0 0
0 0 0
0 0 0
i i i jE i j
2
2 2
ε E ε σ ε I
Y = Xβ + ε E Y E Xβ+ε Xβ E ε Xβ
σ Y σ Xβ+ε
2
2 2
2
0 0
0 0
0 0
2σ ε I
Mean and Variance of Linear Functions of Y
1
11 1 1 1 11 1 1
1
1 1 1
1 1
matrix of fixed constants random vector
...
random vector:
...
k n n
n n n
k
k kn n k k kn n
k
a a Y W a Y a Y
a a Y W a Y a Y
E W E a
E W
A Y
W AY W
E W
1 1 1 11 1 1
1 1 1 1
11 1 1
1
... ...
... ...
n n n n
k kn n k kn n
n
k kn n
Y a Y a E Y a E Y
E a Y a Y a E Y a E Y
a a E Y
a a E Y
2
AE Y
σ W = E AY -AE Y AY -AE Y ' = E A Y -E Y A Y -E Y ' =
E A Y - E Y Y - E Y 'A
2' = AE Y - E Y Y - E Y ' A' = Aσ Y A'
Multivariate Normal Distribution
21 1 1 12 1
22 2 21 2 2
2
1 2
1/2/2
2
Multivariate Normal Density function:
12 exp ~
2
~ , 1,...,
n
n
n n n n n
n
i i i
Y
Y
Y
f N
Y N i n Y
2
-1
Y μ = E Y Σ = σ Y
Y Σ Y -μ 'Σ Y -μ Y μ,Σ
,
Note, if is a (full rank) matrix of fixed constants:
~
i j ijY i j
N
A
W = AY Aμ,AΣA'
Simple Linear Regression in Matrix Form 0 1
1 0 1 1 1
2 0 1 2 2
0 1
1 1 1
2 2 20
1
Simple Linear Regression Model: 1,...,
Defining:
1
1=
1
i i i
n n n
n n n
Y X i n
Y X
Y X
Y X
Y X
Y X
Y X
Y X β ε
0 1 1
0 1 2
0 1
2
2
2
2
since:
Assuming constant variance, and independence of error terms :
0 0
0 0
0 0
Further, assuming normal distributio
n
i
n n
X
X
X
2 2
Y = Xβ +ε Xβ E Y
σ Y = σ ε I
2n for error terms : ~ ,i N Y Xβ I
Estimating Parameters by Least Squares
0 1
^ ^ ^ ^2
0 1 0 1
1 1 1 1 1
1
2
1 1
Normal equations obtained from: , and setting each equal to 0:
( ) ( )
Note: In matrix form: '
n n n n n
i i i i i i
i i i i i
n
i
i
n n
i i
i i
Q Q
i n X Y ii X X X Y
n X
X X
X X
^
1 0
^
1
1
2
0 1 0 0
1 1
Defining
Based on matrix form:
2
' 2 2
n
i
i
n
i i
i
n n
i i i
i i
Y
X Y
Q
Y Y Y X Y n
^
^ ^-1
X'Y β
X'Xβ = X'Y β = X'X X'Y
Y - Xβ ' Y - Xβ = Y'Y -Y'Xβ -β'X'Y +β'X'Xβ Y'Y - Y'Xβ + β'X'Xβ
2 2
1 1
1 1
0 1set
1 10
2
0 1
1 1 11
2 2 2
( ) 2 2 ' '
2 2 2
General Result for fixed symmetric matr
n n
i i
i i
n n
i i
i i
n n n
i i i i
i i i
X X
QY n X
X Y X X
^ ^-1
X'Y X'Xβ 0 X Xβ X Y β = X'X X'Yβ
ix and variable vector : 2
A w Aw A' wAw Aww w
Fitted Values and Residuals
^ ^ ^
0 1
^ ^^
1 0 1 1
^ ^^^
2 0 1 2
^ ^^
0 1
In Matrix form:
is called the "projection" or "hat" matrix, note that is id
i ii i i
n n
Y b X e Y Y
XY
XY
XY
^-1 -1
Y Xβ = X X'X X'Y = PY P = X X'X X'
P P
^ ^
1 111
^ ^
22 22
^ ^
empotent ( ) and symmetric( ):
n
n nn
Y Y YY
YY Y Y
YY Y Y
-1 -1 -1 -1 -1 -1
^
PP = P P = P'
PP = X X'X X'X X'X X' X'I X'X X' X X'X X' P P' = X X'X X' ' = X X'X X' = P
e Y - Y = Y - Xβ
2
2
Note:
MSE MSE
^
^ ^-1 2 2
2 2
^2 2
= Y - PY = (I - P)Y
E Y = E PY = PE Y = PXβ = X X'X X'Xβ = Xβ σ Y = Pσ IP' P
E e = E I P Y = I P E Y = I P Xβ Xβ - Xβ = 0 σ e = I P σ I I P ' I P
s Y P s e = I P
Analysis of Variance
2
212
1 1
2
12
1
Total (Corrected) Sum of Squares:
1 11 1 1
Note: ' ' '
1 1
Defining
n
in ni
i i
i i
n
ini
in n
i
Y
SSTO Y Y Yn
Y
Y SSTOn n n n
SSE
Y Y Y'JY J Y Y Y'JY Y I J Y
2^
1
as the Residual (Error) Sum of Squares:
since
Defining as the Regression Su
n
ii
i
SSE Y Y
SSR
^ ^ ^ ^ ^ ^ ^
^-1
e'e = Y - Xβ ' Y - Xβ = Y'Y -Y'Xβ-βX'Y +β'X'Xβ = Y'Y -β 'X'Y = Y' I - P Y
β'X'Y = Y'X' X'X X'Y = Y'PY
2
2^
1
1
m of Squares:
1 1'
Note that , , and are all QUADRATIC FORMS: for symmetric and idempotent matrice
n
ini
i
i
Y
SSR Y Y SSTO SSEn n n
SSTO SSR SSE
^
β'X'Y Y'PY Y'JY Y P J Y
Y'AY s A
Inferences in Linear Regression
2
2
2 2
1 1
2 2
1 1
where 2
1
Recall:1
n n
i i
i i
n n
i i
i i
MSE
SSEs MSE
n
X X
nX X X X
X
X X X X
^ ^-1 -1 -1
^ ^-1 -1 -1 -1 -1 -12 2 2 2 2
-1 2
β = X'X X'Y E β = X'X X'E Y = X'X X'Xβ = β
σ β = X'X X'σ Y X X'X = σ X'X X'IX X'X = σ X'X s β X'X
X'X s β
2
2 2
1 1
2 2
1 1
^ ^ ^ ^2
0 1
Estimated Mean Response at :
1
Predicted New Response at
n n
i i
i i
n n
i i
i i
h
h hh
h
MSE X MSE XMSE
nX X X X
XMSE MSE
X X X X
X X
Y X s Y MSEX
^
^ ^-12
h h h h h hX 'β X X 's β X X ' X'X X
^ ^ ^
2
0 1
:
pred 1
h
h h
X X
Y X s MSE
^
-1
h h hX 'β X ' X'X X