locally principal component learning for face representation and recognition

5
Neurocomputing 69 (2006) 1697–1701 Letters Locally principal component learning for face representation and recognition Jian Yang a,b, , David Zhang a , Jing-yu Yang b a Department of Computing, Biometric Research Centre, Hong Kong Polytechnic University, Kowloon, Hong Kong b Department of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, PR China Received 6 October 2005; received in revised form 23 January 2006; accepted 24 January 2006 Communicated by R.W. Newcomb Abstract This paper develops a method called locally principal component analysis (LPCA) for data representation. LPCA is a linear and unsupervised subspace-learning technique, which focuses on the data points within local neighborhoods and seeks to discover the local structure of data. This local structure may contain useful information for discrimination. LPCA is tested and evaluated using the AT&T face database. The experimental results show that LPCA is effective for dimension reduction and more powerful than PCA for face recognition. r 2006 Elsevier B.V. All rights reserved. Keywords: Principal component analysis (PCA); Locality-based learning; Dimensionality reduction; Feature extraction; Face recognition 1. Introduction Principal component analysis (PCA) is a classical technique for linear dimension reduction. It has been successfully applied to face recognition [5]. PCA produces a compact representation and preserves the global geometric structure of data well in a low-dimensional space when the given data are linearly distributed. But, when the data are distributed in a nonlinear way, PCA may fail to discover the intrinsic structure of data due to its intrinsic linearity. In this respect, Kernel PCA (KPCA) may perform well since it can make the data structure as linear as possible by virtue of an implicit nonlinear mapping determined by a kernel. KPCA turns out to be effective in a lot of real- world applications. But, its computational complexity is still a problem, which restricts its applications to a certain extent. As opposed to the globality-based learning techniques like PCA and KPCA, locality-based learning methods, represented by locally linear embedding (LLE) [4] and Laplacian eigenmap [1], appeared in the last few years. Both methods seek to discover the global structure of data via locally linear fits, based on the fact that a global nonlinear structure can be viewed locally linear. However, the mapping of LLE (or Laplacian eigenmap) is always implicit and close to the training data set so that it is difficult to obtain the image of a data point from the testing set. This makes LLE and Laplacian eigenmap unsuitable for some recognition tasks. Recently, a locality preserving projection (LPP) technique [3] was proposed and applied to face recognition. LPP can preserve the intrinsic geometry of data and yield an explicit linear mapping suitable for training and testing samples. Motivated by the idea of LLP, we develop a locally principal component analysis (LPCA) technique for face representation. Differing from PCA, LPCA seeks to discover the local structure of data (not the global structure). When the data are distributed in a nonlinear way, linear methods like PCA may fail to capture the global geometric structure of data, but it is still possible to use the similar linear method to recover the local structure. This local structure may contain useful information for discrimination. ARTICLE IN PRESS www.elsevier.com/locate/neucom 0925-2312/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2006.01.009 Corresponding author. Tel.: +852 2766 7280; fax: +852 2774 0842. E-mail addresses: [email protected] (J. Yang), [email protected] (D. Zhang), [email protected] (J.-y. Yang).

Upload: jian-yang

Post on 10-Sep-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Locally principal component learning for face representation and recognition

ARTICLE IN PRESS

0925-2312/$ - se

doi:10.1016/j.ne

�CorrespondE-mail addr

csdzhang@com

(J.-y. Yang).

Neurocomputing 69 (2006) 1697–1701

www.elsevier.com/locate/neucom

Letters

Locally principal component learning for facerepresentation and recognition

Jian Yanga,b,�, David Zhanga, Jing-yu Yangb

aDepartment of Computing, Biometric Research Centre, Hong Kong Polytechnic University, Kowloon, Hong KongbDepartment of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, PR China

Received 6 October 2005; received in revised form 23 January 2006; accepted 24 January 2006

Communicated by R.W. Newcomb

Abstract

This paper develops a method called locally principal component analysis (LPCA) for data representation. LPCA is a linear and

unsupervised subspace-learning technique, which focuses on the data points within local neighborhoods and seeks to discover the local

structure of data. This local structure may contain useful information for discrimination. LPCA is tested and evaluated using the AT&T

face database. The experimental results show that LPCA is effective for dimension reduction and more powerful than PCA for face

recognition.

r 2006 Elsevier B.V. All rights reserved.

Keywords: Principal component analysis (PCA); Locality-based learning; Dimensionality reduction; Feature extraction; Face recognition

1. Introduction

Principal component analysis (PCA) is a classicaltechnique for linear dimension reduction. It has beensuccessfully applied to face recognition [5]. PCA produces acompact representation and preserves the global geometricstructure of data well in a low-dimensional space when thegiven data are linearly distributed. But, when the data aredistributed in a nonlinear way, PCA may fail to discoverthe intrinsic structure of data due to its intrinsic linearity.In this respect, Kernel PCA (KPCA) may perform wellsince it can make the data structure as linear as possible byvirtue of an implicit nonlinear mapping determined by akernel. KPCA turns out to be effective in a lot of real-world applications. But, its computational complexity isstill a problem, which restricts its applications to a certainextent.

As opposed to the globality-based learning techniqueslike PCA and KPCA, locality-based learning methods,

e front matter r 2006 Elsevier B.V. All rights reserved.

ucom.2006.01.009

ing author. Tel.: +852 2766 7280; fax: +852 2774 0842.

esses: [email protected] (J. Yang),

p.polyu.edu.hk (D. Zhang), [email protected]

represented by locally linear embedding (LLE) [4] andLaplacian eigenmap [1], appeared in the last few years.Both methods seek to discover the global structure of datavia locally linear fits, based on the fact that a globalnonlinear structure can be viewed locally linear. However,the mapping of LLE (or Laplacian eigenmap) is alwaysimplicit and close to the training data set so that it isdifficult to obtain the image of a data point from the testingset. This makes LLE and Laplacian eigenmap unsuitablefor some recognition tasks. Recently, a locality preservingprojection (LPP) technique [3] was proposed and applied toface recognition. LPP can preserve the intrinsic geometryof data and yield an explicit linear mapping suitable fortraining and testing samples.Motivated by the idea of LLP, we develop a locally

principal component analysis (LPCA) technique for facerepresentation. Differing from PCA, LPCA seeks todiscover the local structure of data (not the globalstructure). When the data are distributed in a nonlinearway, linear methods like PCA may fail to capture theglobal geometric structure of data, but it is still possible touse the similar linear method to recover the local structure.This local structure may contain useful information fordiscrimination.

Page 2: Locally principal component learning for face representation and recognition

ARTICLE IN PRESSJ. Yang et al. / Neurocomputing 69 (2006) 1697–17011698

2. Methods

2.1. PCA

Given a set of M training samples (pattern vectors)x1;x2; . . . ;xM in RN , PCA seeks to find a projection axis w,such that the mean square of the Euclidean distancebetween all pairs of the projected sample pointsy1; y2; . . . ; yM (yj ¼ wTxj , j ¼ 1; . . . ;M) is maximized, i.e.,

JðwÞ ¼1

2

1

MM

XMi¼1

XMj¼1

ðyi � yjÞ2. (1)

It follows that

JðwÞ ¼1

2

1

MM

XMi¼1

XMj¼1

ðwTxi � wTxjÞ2

¼ wT 1

2

1

MM

XMi¼1

XMj¼1

ðxi � xjÞðxi � xjÞT

" #w. ð2Þ

Let us denote

St ¼1

2

1

MM

XMi¼1

XMj¼1

ðxi � xjÞðxi � xjÞT, (3)

and the mean vector x ¼ ð1=MÞPM

j¼1xj. Then, it is easy toshow that

St ¼1

2

1

MM

XMi¼1

XMj¼1

ðxixTi � 2xix

Tj þ xjx

Tj Þ

¼1

2

1

MM2M

XMi¼1

xixTi � 2

XMi¼1

XMj¼1

xixTj

" #

¼1

MMMXMi¼1

xixTi �

XMi¼1

xi

! XMj¼1

xTj

!" #

¼1

M

XMi¼1

xixTi � xx

T

¼1

M

XMi¼1

ðxi � xÞðxi � xÞT. ð4Þ

Eq. (4) indicates that St is essentially the covariance matrixof data. So, the projection axis w that maximizes Eq. (1)can be selected as the eigenvector of St corresponding tothe largest eigenvalue. Similarly, we can obtain a set ofprojection axes of PCA by selecting the dPCA eigenvectorsof St corresponding to the dPCA largest eigenvalues.

2.2. Basic idea of LPCA

For each sample point xi, here we only consider itsneighboring point xj, for example, those within its locald-neighborhood (d40), i.e., xj 2 Ud

xi¼ fx jjx� xijj

2od�� g,

where jj � jj is the notation of the Euclidean norm. Let us

define Udi ¼ j xj 2 Ud

xi

��n o. Obviously, Ud

i is the set of the

indexes (subscripts) of the sample points that belong to thelocal d-neighborhood of xi. Based on this definition, themean square of the Euclidean distances between all pairs ofthe projected sample points within local neighborhoods isgiven by

JLðwÞ ¼1

2

1

ML

XMi¼1

Xj2Ud

i

ðyi � yjÞ2, (5)

where ML ¼PM

i¼1Mi, Mi is the number of elements in Udi .

It follows from Eq. (5) that

JLðwÞ ¼1

2

1

ML

XMi¼1

Xj2Ud

i

ðwTxi � wTxjÞ2

¼ wT 1

2

1

ML

XMi¼1

Xj2Ud

i

ðxi � xjÞðxi � xjÞT

24

35w. ð6Þ

Let us denote

SL ¼1

2

1

ML

XMi¼1

Xj2Ud

i

ðxi � xjÞðxi � xjÞT. (7)

SL is called the local covariance matrix. The eigenvectorsof SL corresponding to the d largest eigenvalues form acoordinate system for LPCA.

2.3. Implementation of LPCA

For the convenience of implementing LPCA, two issuesshould be concerned: first, it is hard to choose a properradius, d, of the local neighborhood in practice, although itis geometrically intuitive to employ the d-neighborhood tocharacterize the locality. Second, it is inefficient toconstruct SL and to calculate its eigenvectors using theformula in Eq. (7), particularly in the high-dimensional andsmall sample size cases. We will address these issues anddevelop a feasible algorithm for LPCA as follows.To avoid the difficulty of choosing the radius of local

neighborhood, the method of K-nearest neighbors isadopted to characterize the ‘‘locality’’ since it is easy tobe operated and implemented. Let us denoteUK

i ¼ fj xj

�� is among the K�nearest nieghbors of xig.Then, JLðwÞ can be reformulated by

JLðwÞ ¼1

2

1

MK

XMi¼1

Xj2UK

i

ðyi � yjÞ2. (8)

Actually, the K-nearest neighbor relationship between allpairs of training samples can be described by an adjacencymatrix H, whose element is given as:

Hij ¼1 if xj is among the K�nearest neighbors of xi;

0 otherwise:

�(9)

Page 3: Locally principal component learning for face representation and recognition

ARTICLE IN PRESSJ. Yang et al. / Neurocomputing 69 (2006) 1697–1701 1699

Thus, Eq. (8) can be rewritten by

JLðwÞ ¼1

2

1

MK

XMi¼1

XMj¼1

Hijðyi � yjÞ2. (10)

Note that the adjacency matrix H is not necessarily asymmetric matrix since the K-nearest neighbor relationshipof a pair of samples may be asymmetric. Specifically, itpossibly happens that xj is among the K-nearest neighborsof xi (Hij ¼ 1) but xi is not among the K-nearest neighborsof xj (Hij ¼ 0). For the convenience of derivation, we needto make H symmetric by the following means:

H 12ðHþHTÞ; i:e:; Hij

12ðHij þHjiÞ. (11)

If xj is among the K-nearest neighbors of xi while xi is notamong the K-nearest neighbors of xj, after symmetrization,we have Hij ¼ Hji ¼

12. For this case, we can view that a

symmetric semi-‘‘K-nearest neighbor relationship’’ existsbetween xi and xj. Finally, it should be stressed that thevalue of the criterion JLðwÞ in Eq. (10) keeps invariant afterthe symmetrization process via Eq. (11).

After the symmetrization of H, it follows from Eq. (10)that

JLðwÞ ¼1

2

1

MK

XMi¼1

XMj¼1

HijðwTxi � wTxjÞ

2

¼ wT 1

2

1

MK

XMi¼1

XMj¼1

Hijðxi � xjÞðxi � xjÞT

" #w ¼ wTSLw.

ð12Þ

Due to the symmetry of H, we have

SL ¼1

2

1

MK

XMi¼1

XMj¼1

HijxixTi þ

XMi¼1

XMj¼1

HijxjxTj

�2XMi¼1

XMj¼1

HijxixTj

!

¼1

MK

XMi¼1

DiixixTi �

XMi¼1

XMj¼1

HijxixTj

!

¼1

MKðXDXT � XHXTÞ

¼1

MKXLXT, ð13Þ

where X ¼ ðx1;x2; . . . ;xM Þ, and D is a diagonal matrixwhose elements on the diagonal are column (or row sinceHis symmetrized) sums of H, i.e., Dii ¼

PMj¼1Hij . L ¼ D�H

is called the Laplacian matrix in [1].It is obvious that L and SL are both real symmetric

matrices. From Eqs. (10) and (12), we know that wTSLwX0for any nonzero vector w. So, the local covariance matrixSL must be a non negative definite matrix, that is, itsnonzero eigenvalues are all positive.

The formulation of SL in Eq. (13) provides us a moreefficient way to construct the local covariance matrix andto calculate its eigenvectors in small sample size case. Since

L is a real symmetric matrix, its eigenvalues are all real. Wecalculate its all eigenvalues and the corresponding eigen-vectors. Suppose L is the diagonal matrix of eigenvalues ofL and P is the full matrix whose columns are thecorresponding eigenvectors. Then, L can be decomposedby

L ¼ PLPT ¼ PLPTL; where PL ¼ PL1=2. (14)

Based on the decomposition of L, we have SL ¼

ð1=MKÞ ðXPLÞðXPLÞT Let us define R ¼ ðXPLÞ

TðXPLÞ,

which is an M�M non negative definite matrix. Whenthe training sample size, M, is smaller than the dimensionof the input space, N, the size of R is much smaller thanthat of SL. Thus, it is computationally easier to obtain itseigenvectors. Let us work out R’s orthonormal eigenvec-tors v1; v2; . . . ; vd that correspond to the d largest eigen-vlaues l1Xl2X � � �Xld40. Then, based on the theorem ofsingular value decomposition (SVD) [5], the orthonormaleigenvectors w1;w2; . . . ;wd of SL corresponding to the dlargest eigenvlaues l1=ðMKÞ; l2=ðMKÞ; � � � ; ld=ðMKÞ are

wj ¼1ffiffiffiffilj

p XPLvj ; j ¼ 1; . . . ; d (15)

Let W ¼ ðw1;w2; . . . ;wdÞ. The LPCA transform of samplex is

y ¼WTx. (16)

In summary of the description above, the LPCA algorithmis given below:

Step 1: For the given training data set fxiji ¼ 1; � � � ;Mg,find the K-nearest neighbors of each data point andconstruct the adjacency matrix H ¼ ðHijÞM�M using Eq.(9). Symmetrize H using Eq. (11).

Step 2: Construct the M�M diagonal matrix D, whoseelements on the diagonal are given by Dii ¼

PMj¼1Hij.

Then, construct the Laplacian matrix L ¼ D�H.Step 3: Perform the eigenvalue decomposition of L by

L ¼ PLPT ¼ PLPTL, where PL ¼ PL1=2.

Step 4: Construct the matrix R ¼ ðXPLÞTðXPLÞ, where

X ¼ ðx1;x2; . . . ;xM Þ. Calculate the orthonormal eigenvec-tors v1; v2; . . . ; vd of R corresponding to the d largesteigenvlaues l1Xl2X � � �Xld40. The d projection axes ofLPCA are wj ¼ XPLvj

� ffiffiffiffilj

p, j ¼ 1; . . . ; d.

Step 5: Perform the linear transform of sample x usingEq. (16) and obtain the low-dimensional LPCA featurevector y. y is used to represent x for recognition purpose.

3. Experiments

The proposed method is tested using the standard AT&Tdatabase (http://www.uk.research.att.com/facedataba-se.html). This database contains images from 40 indivi-duals, each providing 10 different images. In the firstexperiment, we use the first five images of each class fortraining and the remaining five for testing. PCA and theproposed LPCA are, respectively, used for feature extrac-tion. In LPCA, the parameter K is chosen as K ¼ 6.

Page 4: Locally principal component learning for face representation and recognition

ARTICLE IN PRESS

Table 1

The maximal recongnition rates (%) of PCA ana LPCA and their corresponding dimensions when the firest five images of each class are used for training

Method Euclidean distance Mahalinobis cosine distance

PCA LPCA PCA LPCA

Recognition rate 93.5 95.5 92.0 96.5

Dimension 42 50 38 38

Table 2

The maximal average recongnition rates (%) of PCA and LPCA and their corresponding dimensions (shown in brackets) when the number of training

samples per class varies from 1 to 5 using 20-fold cross-validation tests

No. training samples/class 1 2 3 4 5

Euclidean distance PCA 67.7 [38] 80.7 [60] 87.1 [50] 91.1 [50] 94.0 [56]

LPCA 69.0 [30] 82.3 [42] 88.0 [50] 91.7 [46] 94.4 [46]

Mahalinobis cosine distance PCA 68.1 [32] 81.3 [40] 87.1 [34] 90.9 [46] 93.5 [44]

LPCA 68.6 [38] 83.4 [44] 89.3 [44] 92.7 [38] 95.1 [56]

J. Yang et al. / Neurocomputing 69 (2006) 1697–17011700

Finally, nearest-neighbour classifiers with Euclidean dis-tance and Mahalinobis cosine distance [2] are, respectively,employed for classification. The maximal recognition ratesof PCA and LPCA and their corresponding dimensions aregiven in Table 1. Table 1 shows LPCA outperforms PCAwith two different distance metrics.

To alleviate the effect on the recognition performancethat results from the choice of training set, in the secondexperiment, we perform a series of 20-fold cross-validationtests. In these tests, the number of training samples perclass, t, is allowed to vary from 1 to 5. In LPCA, theparameter K is chosen as K ¼ tþ 2. The maximal averagerecognition rates across 20 runs of PCA and LPCA undernearest-neighbour classifiers with two distance metrics andtheir corresponding dimensions are shown in Table 2.Table 2 indicates LPCA consistently outperforms PCAirrespective of the variation in training sample size.

4. Conclusions

A linear and unsupervised subspace-learning technique,locally principal component analysis (LPCA), is developedin this paper. Since the global nonlinear structure can beviewed locally linear, it is possible to use the lineartechnique to recover the local structure of data, althoughit is almost impossible to use the same technique to recoverthe global geometric structure. Our experimental resultsindicate that the local structure does contain effectiveinformation for discrimination.

Acknowledgments

This research was supported by the CERG fund fromthe HKSAR Government and the central fund from theHong Kong Polytechnic University, the National ScienceFoundation of China under Grant nos. 60503026 and

60332010, no. 60472060, and no. 60473039. Dr. Yang wassupported by China and the Hong Kong PolytechnicUniversity Postdoctoral Fellowships.

References

[1] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality

reduction and data representation, Neural comput. 15 (6) (2003)

1373–1396.

[2] R. Beveridge, D. Bolme, M. Teixeira, B. Draper, The CSU Face

Identification Evaluation System User’s Guide: Version 5.0, http://

www.cs.colostate.edu/evalfacerec/.

[3] X. He, S. Yan, Y. Hu, P. Niyogi, H.-J. Zhang, Face recognition using

laplacianfaces, IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005)

328–340.

[4] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally

linear embedding, Science 290 (5500) (2000) 2323–2326.

[5] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognitive

Neurosci. 3 (1) (1991) 71–86.

Jian Yang was born in Jiangsu, China, June 1973.

He obtained his Bachelor of Science in Mathe-

matics at the Xuzhou Normal University in 1995.

He then continued to complete a Masters of

Science degree in Applied Mathematics at the

Changsha Railway University in 1998 and his

Ph.D. at the Nanjing University of Science and

Technology (NUST) in the Department of

Computer Science on the subject of Pattern

Recognition and Intelligence Systems in 2002.

From January to December in 2003, he was a postdoctoral researcher at

the University of Zaragoza and affiliated with the Division of

Bioengineering of the Aragon Institute of Engineering Research (I3A).

In the same year, he was awarded the RyC program Research Fellowship,

sponsored by the Spanish Ministry of Science and Technology. Now, he is

a professor at the Department of Computer Science of NUST and, at the

same time, a Postdoctoral Research Fellow at Biometrics Centre of Hong

Kong Polytechnic University. He is the author of more than 30 scientific

papers in pattern recognition and computer vision. His current research

interests include pattern recognition, computer vision and machine

learning.

Page 5: Locally principal component learning for face representation and recognition

ARTICLE IN PRESSJ. Yang et al. / Neurocomputing 69 (2006) 1697–1701 1701

David Zhang graduated in computer science from

the Peking University in 1974 and received his

M.Sc. and Ph.D. degrees in computer science and

engineering from the Harbin Institute of Tech-

nology (HIT) in 1983 and 1985, respectively. He

received his second Ph.D. in electrical and

computer engineering at the University of Water-

loo, Ontario, Canada, in 1994. After that, he was

an associate professor at the City University of

Hong Kong and a chair professor at the Hong

Kong Polytechnic University. Currently, he is a founder and director of

the Biometrics Technology Centre supported by the UGC of the

Government of the Hong Kong SAR. He is the Founder and Editor-in-

Chief of the International Journal of Image and Graphics, and an

Associate Editor in some international journals such as IEEE Transac-

tions on SMC-C, Pattern Recognition, and International Journal of

Pattern Recognition and Artificial Intelligence. His research interests

include automated biometrics-based identification, neural systems and

applications, and image processing and pattern recognition. So far, he has

published over 180 articles as well as 10 books, and won numerous prizes.

Jing-yu Yang received the B.S. Degree in

Computer Science from Nanjing University of

Science and Technology (NUST), Nanjing, Chi-

na. From 1982 to 1984 he was a visiting scientist

at the Coordinated Science Laboratory, Univer-

sity of Illinois at Urbana-Champaign. From 1993

to 1994 he was a visiting professor at the

Department of Computer Science, Missuria

University. And in 1998, he acted as a visiting

professor at Concordia University in Canada. He

is currently a professor and Chairman in the department of Computer

Science at NUST. He is the author of over 300 scientific papers in

computer vision, pattern recognition, and artificial intelligence. He has

won more than 20 provincial awards and national awards. His current

research interests are in the areas of pattern recognition, robot vision,

image processing, data fusion, and artificial intelligence.