1 matrix decomposition and its application in statistics nishith kumar lecturer department of...

82
1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: [email protected]

Upload: anis-harvey

Post on 22-Jan-2016

249 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

1

Matrix Decomposition and its Application in Statistics

Nishith KumarLecturer

Department of StatisticsBegum Rokeya University, Rangpur.

Email: [email protected]

Page 2: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

2

Overview

• Introduction• LU decomposition• QR decomposition• Cholesky decomposition• Jordan Decomposition• Spectral decomposition• Singular value decomposition• Applications

Page 3: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

3

Introduction

Some of most frequently used decompositions are the LU, QR, Cholesky, Jordan, Spectral decomposition and Singular value decompositions.

This Lecture covers relevant matrix decompositions, basic numerical methods, its computation and some of its applications.

Decompositions provide a numerically stable way to solve a system of linear equations, as shown already in [Wampler,  1970], and to invert a matrix. Additionally, they provide an important tool for analyzing the numerical stability of a system.

Page 4: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

4

Easy to solve system (Cont.)

Some linear system that can be easily solved

The solution:

nnn ab

ab

ab

/

/

/

222

111

Page 5: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

5

Easy to solve system (Cont.)Lower triangular matrix:

Solution: This system is solved using forward substitution

Page 6: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

6

Easy to solve system (Cont.)Upper Triangular Matrix:

Solution: This system is solved using Backward substitution

Page 7: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

7

LU Decomposition

and

Where,

mm

m

m

u

uu

uuu

U

00

0 222

11211

mmmm lll

ll

l

L

21

2221

11

0

00

LUA

LU decomposition was originally derived as a decomposition of quadratic and bilinear forms. Lagrange, in the very first paper in his collected works( 1759) derives the algorithm we call Gaussian elimination. Later Turing introduced the LU decomposition of a matrix in 1948 that is used to solve the system of linear equation.

Let A be a m × m with nonsingular square matrix. Then there exists two matrices L and U such that, where L is a lower triangular matrix and U is an upper triangular matrix.

J-L Lagrange

(1736 –1813) A. M. Turing

(1912-1954)

Page 8: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

8

A … U (upper triangular) U = Ek E1 A A = (E1)1 (Ek)1 U

If each such elementary matrix Ei is a lower triangular matrices,it can be proved that (E1)1, , (Ek)1 are lower triangular, and(E1)1 (Ek)1 is a lower triangular matrix.Let L=(E1)1 (Ek)1 then A=LU.

How to decompose A=LU?

2133

6812

226

102/1

012

001

130

010

001

500

240

226

2133

6812

226

102/1

012

001

1120

240

226

Now,

2133

6812

226

A

U E2 E1 A

Page 9: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

9

Calculation of L and U (cont.)

Now reducing the first column we have

2133

6812

226

A

2133

6812

226

100

010

001

2133

6812

226

102/1

012

001

130

010

001

500

240

226

2133

6812

226

102/1

012

001

1120

240

226

=

Page 10: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

10

If A is a Non singular matrix then for each L (lower triangular matrix) the upper triangular matrix is unique but an LU decomposition is not unique. There can be more than one such LU decomposition for a matrix. Such as

Calculation of L and U (cont.)

132/1

012

001

130

010

001

102/1

012

001

130

010

001

102/1

012

00111

2133

6812

226

A

132/1

012

001

500

240

226

2133

6812

226

A

133

0112

006

500

240

6/26/21

Now

Therefore,

=

=LU=

=LU

Page 11: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

11

Calculation of L and U (cont.)

Thus LU decomposition is not unique. Since we compute LU decomposition by elementary transformation so if we change

L then U will be changed such that A=LU

To find out the unique LU decomposition, it is necessary to put some restriction on L and U matrices. For example, we can require the lower triangular matrix L to be a unit one (i.e. set all the entries of its main diagonal to ones).

LU Decomposition in R:• library(Matrix)• x<-matrix(c(3,2,1, 9,3,4,4,2,5 ),ncol=3,nrow=3)• expand(lu(x))

Calculation of L and U (cont.)

Page 12: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

12

• Note: there are also generalizations of LU to non-square and singular matrices, such as rank revealing LU factorization.

• [Pan, C.T. (2000). On the existence and computation of rank revealing LU factorizations. Linear Algebra and its Applications, 316: 199-222.

• Miranian, L. and Gu, M. (2003). Strong rank revealing LU factorizations. Linear Algebra and its Applications, 367: 1-16.]

• Uses: The LU decomposition is most commonly used in the solution of systems of simultaneous linear equations. We can also find determinant easily by using LU decomposition (Product of the diagonal element of upper and lower triangular matrix).

Calculation of L and U (cont.)

Page 13: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

13

Solving system of linear equation using LU decomposition

Suppose we would like to solve a  m×m  system AX = b. Then we can find a LU-decomposition for A, then to solve AX =b, it is enough to solve the systems

Thus the system LY = b can be solved by the method of forward substitution and the system UX = Y can be solved by the method of

backward substitution. To illustrate, we give some examples

Consider the given system AX = b, where

and

2133

6812

226

A

17

14

8

b

Page 14: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

14

We have seen A = LU, where

Thus, to solve AX = b, we first solve LY = b by forward substitution

Then

Solving system of linear equation using LU decomposition

132/1

012

001

L

500

240

226

U

17

14

8

132/1

012

001

3

2

1

y

y

y

15

2

8

3

2

1

y

y

y

Y

Page 15: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

15

Now, we solve UX =Y by backward substitution

then

Solving system of linear equation using LU decomposition

15

2

8

500

240

226

3

2

1

x

x

x

3

2

1

3

2

1

x

x

x

Page 16: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

16

QR Decomposition

If A is a m×n matrix with linearly independent columns, then A can be decomposed as , where Q is a m×n matrix whose columns form an orthonormal basis for the column space of A and R is an nonsingular upper triangular matrix.

QRA

Jørgen Pedersen Gram

(1850 –1916) Erhard Schmidt

(1876-1959)

Firstly QR decomposition

originated with Gram(1883).

Later Erhard Schmidt (1907)

proved the QR Decomposition

Theorem

Page 17: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

17

QR-Decomposition (Cont.)

Theorem : If A is a m×n matrix with linearly independent columns, then A can be decomposed as , where Q is a m×n matrix whose columns form an orthonormal basis for the column space of A and R is an nonsingular upper triangular matrix.

Proof: Suppose A=[u1 | u2| . . . | un] and rank (A) = n.

Apply the Gram-Schmidt process to {u1, u2 , . . . ,un} and the

orthogonal vectors v1, v2 , . . . ,vn are

Let for i=1,2,. . ., n. Thus q1, q2 , . . . ,qn form a orthonormal basis for the column space of A.

QRA

12

1

122

2

212

1

1 ,,,

i

i

iiiiii v

v

vuv

v

vuv

v

vuuv

i

ii v

vq

Page 18: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

18

QR-Decomposition (Cont.)

Now,

i.e.,

Thus ui is orthogonal to qj for j>i;

12

1

122

2

212

1

1 ,,,

i

i

iiiiii v

v

vuv

v

vuv

v

vuvu

112211 ,,, iiiiiiii qquqquqquqvu

},,{ },,,{ 221 iiii qqqspanvvvspanu

112211

223113333

112222

111

,,,

,,

,

nnnnnnnn qquqquqquqvu

qquqquqvu

qquqvu

qvu

Page 19: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

19

Let Q= [q1 q2 . . . qn] , so Q is a m×n matrix whose columns form an

orthonormal basis for the column space of A .

Now,

i.e., A=QR.

Where,

Thus A can be decomposed as A=QR , where R is an upper triangular and nonsingular matrix.

QR-Decomposition (Cont.)

n

n

n

n

nn

v

quv

ququv

quququv

qqquuuA

0000

,00

,,0

,,,

33

2232

113121

2121

n

n

n

n

v

quv

ququv

quququv

R

0000

,00

,,0

,,,

33

2232

113121

Page 20: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

20

QR Decomposition

Example: Find the QR decomposition of

100

011

001

111

A

Page 21: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

21

Applying Gram-Schmidt process of computing QR decomposition

1st Step:

2nd Step:

3rd Step:

Calculation of QR Decomposition

0

31

31

31

1

3

11

1

111

aa

q

ar

322112 aqr T

0

6/1

32

6/1

ˆˆ1

32ˆ

0

3/1

3/2

3/1

0

31

31

31

)3/2(

0

1

0

1

ˆ

22

2

222

121221122

qq

q

qr

rqaaqqaq T

Page 22: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

22

4th Step:

5th Step:

6th Step:

Calculation of QR Decomposition

313113 aqr T

613223 aqr T

6/2

6/1

0

6/1

ˆˆ1

2/6ˆ

1

2/1

0

2/1

ˆ

33

3

333

223113332231133

qq

q

qr

qrqraaqqaqqaq TT

Page 23: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

23

Therefore, A=QR

R code for QR Decomposition:

x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)

qrstr <- qr(x)

Q<-qr.Q(qrstr)

R<-qr.R(qrstr)

Uses: QR decomposition is widely used in computer codes to find the eigenvalues of a matrix, to solve linear systems, and to find least squares approximations.

Calculation of QR Decomposition

2/600

6/16/20

3/13/23

6/200

6/16/13/1

06/23/1

6/16/13/1

100

011

001

111

Page 24: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

24

Least square solution using QR Decomposition

The least square solution of b is

Let X=QR. Then

Therefore,

YXbXX tt

ZYQRbYQRRRbRRYQRRbR ttttttttt

11

YQRYX

RbRQRbQRbQRQRbXXttt

ttttt

Page 25: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

25

Cholesky Decomposition Cholesky died from wounds received on the battle field on 31 August

1918 at 5 o'clock in the morning in the North of France. After his death one of his fellow officers, Commandant Benoit, published Cholesky's method of computing solutions to the normal equations for some least squares data fitting problems published in the Bulletin géodesique in 1924. Which is known as Cholesky Decomposition

Cholesky Decomposition: If A is a real, symmetric and positive definite matrix then there exists a unique lower triangular matrix L with positive diagonal element such that .TLLA

Andre-Louis Cholesky

1875-1918

Page 26: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

26

Cholesky Decomposition

Theorem: If A is a n×n real, symmetric and positive definite matrix then there exists a unique lower triangular matrix G with positive diagonal element such that .

Proof: Since A is a n×n real and positive definite so it has a LU decomposition, A=LU. Also let the lower triangular matrix L to be a unit one (i.e. set all the entries of its main diagonal to ones). So in that case LU decomposition is unique. Let us suppose observe that . is a unit upper triangular matrix.

Thus, A=LDMT .Since A is Symmetric so, A=AT . i.e., LDMT =MDLT. From the uniqueness we have L=M. So, A=LDLT . Since A is positive definite so all diagonal elements of D are positive. Let

then we can write A=GGT.

TGGA

),,,( 2211 nnuuudiagD UDM T 1

),,,( 2211 nnddddiagLG

Page 27: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

27

Cholesky Decomposition (Cont.)

Procedure To find out the cholesky decomposition

Suppose

We need to solve

the equation

nnnn

n

n

aaa

aaa

aaa

A

21

22221

11211

TL

nn

n

n

L

nnnnnnnn

n

n

l

ll

lll

lll

ll

l

aaa

aaa

aaa

A

00

00

00

222

12111

21

2221

11

21

22221

11211

Page 28: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

28

Example of Cholesky Decomposition

Suppose

Then Cholesky Decomposition

Now,

2/11

1

2

k

skskkkk lal

522

2102

224

A

311

031

002

L

For k from 1 to n

For j from k+1 to n kk

k

sksjsjkjk lllal

1

1

Page 29: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

29

R code for Cholesky Decomposition

• x<-matrix(c(4,2,-2, 2,10,2, -2,2,5),ncol=3,nrow=3)

• cl<-chol(x)

• If we Decompose A as LDLT then

and

13/12/1

012/1

001

L

300

090

004

D

Page 30: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

30

Application of Cholesky Decomposition

Cholesky Decomposition is used to solve the system of linear equation Ax=b, where A is real symmetric and positive definite.

In regression analysis it could be used to estimate the parameter if XTX is positive definite.

In Kernel principal component analysis, Cholesky decomposition is also used (Weiya Shi;   Yue-Fei Guo; 2010)

Page 31: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

31

Characteristic Roots and Characteristics Vectors

Any nonzero vector x is said to be a characteristic vector of a matrix A, If there exist a number λ such that Ax= λx;

Where A is a square matrix, also then λ is said to be a characteristic root of the matrix A corresponding to the characteristic vector x.

Characteristic root is unique but characteristic vector is not unique.

We calculate characteristics root λ from the characteristic equation |A- λI|=0

For λ= λi the characteristics vector is the solution of x from the following homogeneous system of linear equation (A- λiI)x=0

Theorem: If A is a real symmetric matrix and λi and λj are two distinct latent root of A then the corresponding latent vector xi and xj are orthogonal.

Page 32: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

32

Multiplicity

Algebraic Multiplicity: The number of repetitions of a certain eigenvalue. If, for a certain matrix, λ={3,3,4}, then the algebraic multiplicity of 3 would be 2 (as it appears twice) and the algebraic multiplicity of 4 would be 1 (as it appears once). This type of multiplicity is normally represented by the Greek letter α, where α(λi) represents the algebraic multiplicity of λi.

Geometric Multiplicity: the geometric multiplicity of an eigenvalue is the number of linearly independent eigenvectors associated with it.

Page 33: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

33

Jordan Decomposition Camille Jordan (1870)

• Let A be any n×n matrix then there exists a nonsingular matrix P and JK(λ) a k×k matrix form

Such that

000

010

001

)(kJ

)(000

0)(0

00)(

2

1

1 2

1

rk

k

k

rJ

J

J

APP

where k1+k2+ … + kr =n. Also λi , i=1,2,. . ., r are the characteristic roots

And ki are the algebraic multiplicity of λi ,

Jordan Decomposition is used in Differential equation and time series analysis.

Camille Jordan

(1838-1921)

Page 34: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

34

Spectral Decomposition

Let A be a m × m real symmetric matrix. Then there exists an orthogonal matrix P such that

or , where Λ is a diagonal matrix.

APPT TPPA

CAUCHY, A.L.(1789-1857)

A. L. Cauchy established the Spectral

Decomposition in 1829.

Page 35: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

35

Spectral Decomposition and Principal component Analysis (Cont.)By using spectral decomposition we can write

In multivariate analysis our data is a matrix. Suppose our data is X matrix. Suppose X is mean centered i.e.,

and the variance covariance matrix is ∑. The variance covariance matrix ∑ is real and symmetric.

Using spectral decomposition we can write ∑=PΛPT . Where Λ is a diagonal matrix.

Also

tr(∑) = Total variation of Data =tr(Λ)

TPPA

)( XX

),,,( 21 ndiag

n 21

Page 36: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

36

The Principal component transformation is the transformation

Y=(X-µ)P

Where,

E(Yi)=0

V(Yi)=λi

Cov(Yi ,Yj)=0 if i ≠ j

V(Y1) ≥ V(Y2) ≥ . . . ≥ V(Yn)

Spectral Decomposition and Principal component Analysis (Cont.)

n

ii trYV

1

)()(

n

iiYV

1

)(

Page 37: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

37

R code for Spectral Decomposition

x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)

eigen(x)

Application: For Data Reduction. Image Processing and Compression. K-Selection for K-means clustering Multivariate Outliers Detection Noise Filtering Trend detection in the observations.

Page 38: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

38

There are five mathematicians who were responsible for establishing the existence of the

singular value decomposition and developing its theory.

Historical background of SVD

Eugenio Beltrami

(1835-1899)

Camille Jordan

(1838-1921)

James Joseph

Sylvester

(1814-1897)

Erhard Schmidt

(1876-1959)

Hermann Weyl

(1885-1955)

The Singular Value Decomposition was originally developed by two mathematician in the

mid to late 1800’s

1. Eugenio Beltrami , 2.Camille Jordan

Several other mathematicians took part in the final developments of the SVD including James

Joseph Sylvester, Erhard Schmidt and Hermann Weyl who studied the SVD into the mid-1900’s.

C.Eckart and G. Young prove low rank approximation of SVD (1936).

C.Eckart

Page 39: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

39

What is SVD?

Any real (m×n) matrix X, where (n≤ m), can bedecomposed, X = UΛVT

U is a (m×n) column orthonormal matrix (UTU=I), containing the eigenvectors of the symmetric matrix XXT.

Λ is a (n×n ) diagonal matrix, containing the singular values of matrix X. The number of non zero diagonal elements of Λ corresponds to the rank of X.

VT is a (n×n ) row orthonormal matrix (VTV=I), containing the eigenvectors of the symmetric matrix XTX.

Page 40: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

40

Theorem (Singular Value Decomposition) : Let X be m×n of rank r, r ≤ n ≤ m. Then there exist matrices U , V and a diagonal matrix Λ , with positive diagonal elements such that,

Proof: Since X is m × n of rank r, r ≤ n ≤ m. So XXT and XTX both of rank r ( by using the concept of Grammian matrix ) and of dimension m × m and n × n respectively. Since XXT is real symmetric matrix so we can write by spectral decomposition,

Where Q and D are respectively, the matrices of characteristic vectors and corresponding characteristic roots of XXT.

Again since XTX is real symmetric matrix so we can write by spectral decomposition,

Singular Value Decomposition (Cont.)

TVUX

TT QDQXX

TT RMRXX

Page 41: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

41

Where R is the (orthogonal) matrix of characteristic vectors and M is diagonal matrix of the corresponding characteristic roots.

Since XXT and XTX are both of rank r, only r of their characteristic roots are positive, the remaining being zero. Hence we can write,

Also we can write,

Singular Value Decomposition (Cont.)

00

0rDD

00

0rMM

Page 42: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

42

We know that the nonzero characteristic roots of XXT and XTX are equal so

Partition Q, R conformably with D and M, respectively

i.e., ; such that Qr is m × r , Rr is n × r and correspond respectively to the nonzero characteristic roots of XXT and XTX. Now take

Where are the positive characteristic roots of XXT and hence those of XTX as well (by using the concept of grammian matrix.)

Singular Value Decomposition (Cont.)

rr MD

) ,( *QQQ r )R ,( *rRR

r

r

RV

QU

),,,( 2/12/12

2/11

2/1rr ddddiagD

rid i ,,2,1 ,

Page 43: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

43

Now define,

Now we shall show that S=X thus completing the proof.

Similarly,

From the first relation above we conclude that for an arbitrary orthogonal matrix, say P1 ,

While from the second we conclude that for an arbitrary orthogonal matrix, say P2

We must have

Singular Value Decomposition (Cont.)

Trrr RDQS 2/1

XX

RMR

RMR

RDR

RDQQDR

RDQRDQSS

T

T

Trrr

Trrr

Trrr

Trrr

Trrr

TTrrr

T

)(2/12/1

2/12/1

TT XXSS

XPS 1

2XPS

Page 44: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

44

The preceding, however, implies that for arbitrary orthogonal matrices P1 , P2 the matrix X satisfies

Which in turn implies that,

Thus

Singular Value Decomposition (Cont.)

2211 , XPXPXXPXXPXX TTTTTT

nm IPIP 21 ,

TTrrr VURDQSX 2/1

Page 45: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

45

R Code for Singular Value Decomposition

x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)

sv<-svd(x)

D<-sv$d

U<-sv$u

V<-sv$v

Page 46: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

46

Decomposition in Diagram

Matrix A

Lu decomposition

Not always uniqueQR Decomposition

Full column rank

SquareRectangular

SVDSymmetricAsymmetric

PD

Cholesky

DecompositionSpectral

Decomposition

AM>GM

Jordan

Decomposition

AM=GM

Similar

Diagonalization

P-1AP=Λ

Page 47: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

47

Properties Of SVD

Rewriting the SVD

where r = rank of A

λi = the i-th diagonal element of Λ.

ui and vi are the i-th columns of U and V respectively.

Ti

r

iii

T vuVUA

1

Page 48: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

48

Proprieties of SVDLow rank Approximation

Theorem: If A=UΛVT is the SVD of A and the singular values are sorted as , then for any l <r, the best rank-l approximation to A is ;

Low rank approximation technique is very muchimportant for data compression.

n 21

Ti

l

iii vuA

1

~

r

liiAA

1

22~

Page 49: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

49

• SVD can be used to compute optimal low-rank approximations.

• Approximation of A is à of rank k such that

If are the characteristics roots of ATA then

à and X are both mn matrices.

Low-rank Approximation

FkXrankXXAMinA

)(:

~Frobenius norm

m

i

n

jijaA

1

2

1

nddd ,,, 21

n

iidA

1

2

Page 50: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

50

Low-rank Approximation

• Solution via SVD

set smallest r-ksingular values to zero

TV

UX

***

***

***

***

***

***

***

***

***

***

***

***

***

K=2

Tk VUA )0,...,0,,...,(diag

~1

column notation: sum of rank 1 matrices

Tii

k

i i vuA

1

~

Page 51: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

51

Approximation error

• How good (bad) is this approximation?

• It’s the best possible, measured by the Frobenius norm of the error:

• where the λi are ordered such that λi λi+1.

r

kii

FF

kXrankX

AAXA1

222

)(:

~min

2~F

AA Now

Page 52: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

52

Row approximation and column approximation

Suppose Ri and cj represent the i-th row and j-th column of A. The SVD

of A and is

The SVD equation for Ri is

We can approximate Ri by ; l<r

where i = 1,…,m.

r

kkkjkj uvC

1

A~

Tk

l

kkk

Tlll vuVUA

1

~ Tk

r

kkk

T vuVUA

1

r

kkkiki vuR

1

l

kkkik

li vuR

1

Also the SVD equation for Cj is,

where j = 1, 2, …, n

We can also approximate Cj by ; l<r

l

kkkjk

lj uvC

1

Page 53: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

53

Least square solution in inconsistent system

By using SVD we can solve the inconsistent system.This gives the least square solution.

The least square solution

where Ag be the MP inverse of A.

2minbAx

x

Page 54: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

54

The SVD of Ag is

This can be written as

Where

Page 55: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

55Basic Results of SVD

Page 56: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

56

SVD based PCA

If we reduced variable by using SVD then it performs like PCA.

Suppose X is a mean centered data matrix, ThenX using SVD, X=UΛVT

we can write- XV = UΛSuppose Y = XV = UΛ Then the first columns of Y represents the first principal component score and so on.

o SVD Based PC is more Numerically Stable.o If no. of variables is greater than no. of observations then SVD based PCA will

give efficient result(Antti Niemistö, Statistical Analysis of Gene Expression Microarray Data,2005)

Page 57: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

57

Data Reduction both variables and observations. Solving linear least square Problems Image Processing and Compression. K-Selection for K-means clustering Multivariate Outliers Detection Noise Filtering Trend detection in the observations and the variables.

Application of SVD

Page 58: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

58

Origin of biplot

Gabriel (1971) One of the most

important advances in data analysis in recent decades

Currently… > 50,000 web pages Numerous academic

publications Included in most

statistical analysis packages

Still a very new technique to most scientists

Prof. Ruben Gabriel, “The founder of biplot”Courtesy of Prof. Purificación Galindo

University of Salamanca, Spain

Page 59: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

59

What is a biplot?

• “Biplot” = “bi” + “plot”– “plot”

• scatter plot of two rows OR of two columns, or• scatter plot summarizing the rows OR the columns

– “bi” • BOTH rows AND columns

• 1 biplot >> 2 plots

Page 60: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

60

Practical definition of a biplot“Any two-way table can be analyzed using a 2D-biplot as soon as it can be

sufficiently approximated by a rank-2 matrix.” (Gabriel, 1971)

G-by-E table

Matrix decomposition

P(4, 3) G(3, 2) E(2, 3)

(Now 3D-biplots are also possible…)

214

332

321

044

313

332

341

121284

96103

151262

69201

321

y

x

eee

g

g

g

g

yx

g

g

g

g

eee

-4

-3

-2

-1

0

1

2

3

4

5

-4 -3 -2 -1 0 1 2 3 4 5

X

Y

O

G1G2

G3

G4

E1

E2

E3

Page 61: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

61

Singular Value Decomposition (SVD) & Singular Value Partitioning (SVP)

SVD:

SVP:

BiplotPlot Plot

r

kkj

fk

fkik

SVP

r

kkjkik

SVDij

vu

vuX

1

1

1

))((

The ‘rank’ of Y, i.e., the minimum number of PC required to fully represent Y

Matrix characterising the rows

“Singular values”Matrix characterising the columns

Rows scores

Column scores

f=1

f=0

f=1/2

Common uses value

of f

Page 62: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

62

Biplot

The simplest biplot is to show the first two PCs together with the projections of the axes of the original variables

x-axis represents the scores for the first principal component

Y-axis the scores for the second principal component. The original variables are represented by arrows which

graphically indicate the proportion of the original variance explained by the first two principal components.

The direction of the arrows indicates the relative loadings on the first and second principal components.

Biplot analysis can help to understand the multivariate datai) Graphicallyii) Effectivelyiii) Conveniently.

Page 63: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

63

Biplot of Iris Data

Comp. 1

Co

mp

. 2

-0.2 -0.1 0.0 0.1 0.2

-0.2

-0.1

0.0

0.1

0.2

1

1

11

1

1

11

1

1

1

1

11

1

1

1

1

11

1

1

1

11

1

111

11

1

1

1

11

11

1

11

1

1

1

1

1

1

1

1

1

22 2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

22

2

2

2

2

2

22

2

2

2

2

22

2

2

2

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

33

3

33

3

3

3

33

3

33

3

3

333

3

33

3

3

3

3

3

-10 -5 0 5 10

-10

-50

51

0

Sepal L.

Sepal W.

Petal L.Petal W.

1= Setosa2= Versicolor3= Virginica

Page 64: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

64

Image Compression Example

Pansy Flower image, collected fromhttp://www.ats.ucla.edu/stat/r/code/pansy.jpg

This image is 600×465 pixels

Page 65: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

65

Singular values of flowers image

Plot of the singular values

Page 66: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

66

Low rank Approximation to flowers image

Rank-1 approximation Rank- 5 approximation

Page 67: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

67

Rank-20 approximation

Low rank Approximation to flowers image

Rank-30 approximation

Page 68: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

68

Rank-50 approximation

Low rank Approximation to flowers image

Rank-80 approximation

Page 69: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

69

Rank-100 approximation

Low rank Approximation to flowers image

Rank-120 approximation

Page 70: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

70Rank-150 approximation True Image

Low rank Approximation to flowers image

Page 71: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

71

Outlier Detection Using SVD

Nishith and Nasser (2007,MSc. Thesis) propose a graphical method of outliers detection using SVD.

It is suitable for both general multivariate data and regression data. For this we construct the scatter plots of first two PC’s, and first PC and third PC. We also make a box in the scatter plot whose range lies

median(1stPC) ± 3 × mad(1stPC) in the X-axis and median(2ndPC/3rdPC) ± 3 × mad(2ndPC/3rdPC) in the Y-axis.

Where mad = median absolute deviation. The points that are outside the box can be considered as

extreme outliers. The points outside one side of the box is termed as outliers. Along with the box we may construct another smaller box bounded by 2.5/2 MAD line

Page 72: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

72

Outlier Detection Using SVD (Cont.)

Scatter plot of Hawkins, Bradu and kass data (a) scatter plot of first two PC’s and

(b) scatter plot of first and third PC.

HAWKINS-BRADU-KASS

(1984) DATA

Data set containing 75 observations

with 14 influential observations.

Among them there are ten high

leverage outliers (cases 1-10)

and for high leverage points

(cases 11-14) -Imon (2005).

Page 73: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

73

Outlier Detection Using SVD (Cont.)

Scatter plot of modified Brown data (a) scatter plot of first

two PC’s and (b) scatter plot of first and third PC.

MODIFIED BROWN DATA

Data set given by Brown (1980).

Ryan (1997) pointed out that the

original data on the 53 patients

which contains 1 outlier

(observation number 24).

Imon and Hadi(2005) modified

this data set by putting two more

outliers as cases 54 and 55.

Also they showed that observations

24, 54 and 55 are outliers by using

generalized standardized

Pearson residual (GSPR)

Page 74: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

74

Cluster Detection Using SVD

Singular Value Decomposition is also used for cluster detection (Nishith, Nasser and Suboron, 2011).

The methods for clustering data using first three

PC’s are given below, median (1st PC) ± k × mad (1st PC) in the X-axis and

median (2nd PC/3rd PC) ± k × mad (2nd PC/3rd PC) in the Y-axis.

Where mad = median absolute deviation. The value of k = 1, 2, 3.

Page 75: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

75

Page 76: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

76

Principals stations in climate data

Page 77: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

77

Climatic Variables

The climatic variables are,1. Rainfall (RF) mm2. Daily mean temperature (T-MEAN)0C3. Maximum temperature (T-MAX)0C4. Minimum temperature (T-MIN)0C5. Day-time temperature (T-DAY)0C6. Night-time temperature (T-NIGHT)0C7. Daily mean water vapor pressure (VP) MBAR8. Daily mean wind speed (WS) m/sec9. Hours of bright sunshine as percentage of maximum possible sunshine

hours (MPS)%10. Solar radiation (SR) cal/cm2/day

Page 78: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

78

Consequences of SVD

Generally many missing values may present in the data. It may also contain

unusual observations. Both types of problem can not handle Classical singular

value decomposition.

Robust singular value decomposition can solve both types of problems.

Robust singular value decomposition can be obtained by alternating L1 regression approach (Douglas M. Hawkins, Li Liu, and S. Stanley Young, (2001)).

Page 79: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

79

Initialize the leadingleft singular vector 1u

There is no obvious choice of the initial values of 1u

Fit the L1 regression coefficient cj by minimizing ; j=1,2,…,p

n

iijij ucx

11

Calculate right singular vector v1=c/║c║, where ║.║ refers to Euclidean norm.

Again fit the L1 regression coefficient

di by minimizing ; i=1,2,….,n

p

jjiij vdx

11

Calculate the resulting estimate of the left eigenvector ui=d/ ║d║

Iterate this process untill it converge.

The Alternating L1 Regression Algorithm for Robust Singular Value Decomposition.

For the second and subsequent of the SVD, we replaced X by a deflated matrix obtained by subtracting the most recently found them in the SVD X X-λkukvk

T

Page 80: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

80

Clustering weather stations on MapUsing RSVD

Page 81: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

81

References• Brown B.W., Jr. (1980). Prediction analysis for binary data. in

Biostatistics Casebook, R.G. Miller, Jr., B. Efron, B. W. Brown, Jr., L.E. Moses (Eds.), New York: Wiley.

• Dhrymes, Phoebus J. (1984), Mathematics for Econometrics, 2nd ed. Springer Verlag, New York.

• Hawkins D. M., Bradu D. and Kass G.V.(1984),Location of several outliers in multiple regression data using elemental sets. Technometrics, 20, 197-208.

• Imon A. H. M. R. (2005). Identifying multiple influential observations in linear Regression. Journal of Applied Statistics 32, 73 – 90.

• Kumar, N. , Nasser, M., and Sarker, S.C., 2011. “A New Singular Value Decomposition Based Robust Graphical Clustering Technique and Its Application in Climatic Data” Journal of Geography and Geology, Canadian Center of Science and Education , Vol-3, No. 1, 227-238.

• Ryan T.P. (1997). Modern Regression Methods, Wiley, New York. • Stewart, G.W. (1998). Matrix Algorithms, Vol 1. Basic

Decompositions, Siam, Philadelphia.• Matrix Decomposition.

http://fedc.wiwi.hu-berlin.de/xplore/ebooks/html/csa/node36.html

Page 82: 1 Matrix Decomposition and its Application in Statistics Nishith Kumar Lecturer Department of Statistics Begum Rokeya University, Rangpur. Email: nk.bru09@gmail.com

82