[ieee 2010 20th international conference on pattern recognition (icpr) - istanbul, turkey...

Globally-Preserving based Locally Linear Embedding

Kanghua Hui, Chunheng Wang, Baihua Xiao The Key Laboratory of Complex Systems and Intelligence Science

Institute of Automation, Chinese Academy of Sciences Beijing, China

{kanghua.hui, chunheng.wang, baihua.xiao}@ia.ac.cn

Abstract—The locally linear embedding (LLE) algorithm is considered as a powerful method for the problem of nonlinear dimensionality reduction. In this paper, a new method called globally-preserving based LLE (GPLLE) is proposed. It not only preserves the local neighborhood, but also keeps those distant samples still far away, which solves the problem that LLE may encounter, i.e. LLE only makes local neighborhood preserving, but can’t prevent the distant samples from nearing. Moreover, GPLLE can estimate the intrinsic dimensionality d of the manifold structure. The experiment results show that GPLLE always achieves better classification performances than LLE based on the estimated d.

Keywords-dimensionality reduction; manifold learning; globally preserving; locally linear; dimensionality estimation

I. INTRODUCTION Dimensionality reduction is useful for the high

dimensional data. The central goal is to obtain more compact representations of the original data which captures the information essential for higher level analysis. That is to say, a part of the information is lost during dimensionality reduction, so it is crucial that the resulting low dimensional data preserve a meaningful structure of the original high dimensional space.

Up to now, many linear methods for dimensionality reduction have been proposed, such as principal component analysis (PCA) [1], multidimensional scaling (MDS) [2], and locality preserving projection (LPP) [3]. Meanwhile, several nonlinear methods are also proposed to discover the nonlinear structure of the manifold. Isomap [4] is a kind of global approach, which builds on classical MDS but seeks to preserve the intrinsic geometry of the data, as captured in the geodesic manifold distances between all pairs of data points. Laplacian Eigenmap [5] is a local approach, which essentially tries to map nearby points on a manifold to nearby points in a low dimensional space.

In this paper, we will focus on LLE [6], an unsupervised learning method, which obtains the low dimensional and neighborhood preserving embeddings of high dimensional data. It’s regarded as one of the effective algorithms for the problem of nonlinear dimensionality reduction, and it has desirable property, mapping high dimensional data into a single global coordinate system of lower dimensionality.

LLE can be briefly summarized in 3 steps:

(1) Find k nearest neighbors { }1 2, , , ki i ii x x xΩ = for each

sample ix in original space, 1, 2, , i N= . (2) Calculate weights ,i jW , which best reconstruct each sample ix from its neighbors:

( )2

,1 1

min N N

i i j ji j

W x W x= =

Φ = −∑ ∑ , (1)

subject to ,1

1N

i jj

W=

=∑ , and , 0i jW = , if j ix ∉ Ω .

(3) Calculate the d dimensional embeddings best reconstructed by the weights ,i jW :

( )2

,1 1

min N N

i i j ji j

Y y W y= =

Φ = −∑ ∑ , (2)

subject to 1

1 NT

i ii

y y IN =

=∑ , and 1

0N

ii

y=

=∑ .

The rest of this paper is organized as follows: First, the proposed method, GPLLE will be introduced and the estimation of intrinsic dimensionality of high dimensional data by GPLLE is immediately following in Section Ⅱ . Second, the experiment results of intrinsic dimensionality estimated by GPLLE is shown in Section Ⅲ , and the classification performances of GPLLE and LLE will be shown, regarded as the evaluation criterion for estimating the mapping quality of proposed method. Finally, we will conclude in Section Ⅳ.

II. GLOBALLY-PRESERVING BASED LLE The key idea used by LLE for dealing with

dimensionality reduction, is to divide the whole nonlinear manifold into local patches that can be approximated by hyperplanes, and then produces a single global coordinate system of low dimensionality. This low dimensional representation describes a true structure of the data for the reason that the properties of the linear reconstruction weights capture and preserve the information of the local neighborhood of the data according to Eq. (1). In ideal circumstance, neighbors of each sample in high dimensional space are still the neighbors of this sample in the low dimensional embedded space. Despite LLE has desirable advantages, it also has some shortcomings. One problem is that it only restricts the local neighborhood of the samples,

2010 International Conference on Pattern Recognition

1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.135

535


1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.135

535


1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.135

531


1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.135

531


1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.135

531

but can’t make certain that those samples distant from a sample in high dimensional space are still far away from this sample in the embedded space, i.e., it may encounter a kind of situation that those distant samples of a sample will be no longer keeping away from this sample, but become to be nearby in the embedded space. Another problem of LLE is that it lacks effective methods to estimate the intrinsic dimensionality d, to which high dimensional data should be reduced (the selection of neighborhood parameter k is also a shortcoming of LLE, but this problem will not be discussed in this paper).

A. The GPLLE algorithm For the above two problems LLE encounters, an effective

method GPLLE is proposed to preserve the neighborhood of the samples like in LLE, and at the same time keep those distant samples of each sample still far away from this sample in low dimensional embedded space. Moreover, it can be used to estimate the intrinsic dimensionality. In detail, it constructs two weight matrices, W and G. The matrix W is to describe the attributes of local linear reconstruction weights just like in LLE, and the matrix G is to describe the attributes of global preserving weights of each sample (i.e. for a sample, the weights between those distant samples and this sample). With W and G, the next step is to design an objective function that makes sure nearby samples are still nearby, and distant samples are still distant simultaneously. The corresponding algorithm is shown as flows: (1) Find k nearest neighbors { }1 2, , , k

i i ii x x xΩ = of each sample ix in original space, 1, 2, , i N= . (2) Calculate two weight matrices W and G,

;2

N N

i i, j j j ii, j i=1 j=1

arg min x - W x , if xW =

0, else.

⎧⎪ ∈ Ω⎨⎪⎩

∑ ∑

subject to ,1

1N

i jj

W=

=∑ .

& ;i jx -x

tj i i ji, j

e , if x xG =0, else.

⎧⎪ ∉ Ω ∉ Ω⎨⎪⎩

(3) Solve the problem: ( )( )( )

max ,T

Y T

tr YLYR Y

tr YMY = where

( )( ) ,TM I W I W= − − I is a N N× identity matrix, ,L = D - G and D is a diagonal weight matrix with

.N

i,i i, jj=1

D = G∑ Or alternatively, calculate the generalized

eigenvectors of Lz Mzλ= . (4) Estimate intrinsic dimensionality d, according to:

d

ii=1

d N

ii=1

=λ

τλ

Ψ ≥∑

∑ (3)

(5) The embedding is as follows: ( )T1 2 dY = z ,z , ,z , where

the rows of Y are the d generalized eigenvectors, corresponding to the first d largest generalized eigenvalues.

B. Justification Recall that the goal of GPLLE is to preserve the local

neighborhood of the samples and keep distant samples still distant, so it equals to minimize

2

N N

i i, j ji=1 j=1

y - W y∑ ∑ (4)

and at the same time, maximize 2N

i j i, ji, j=1

y - y G∑ (5)

where Eq. (4) restricts the nearby samples still nearby, and Eq. (5) restricts the distant samples still distant. Equivalently, it is to solve the objective function

( )

2

max

N

i j i, ji, j=1

Y 2N N

i i, j ji=1 j=1

y - y GR Y

y - W y

=∑

∑ ∑ (6)

Moreover, according to [5] and [6], Eq. (6) is equal to

( )( )( )

max .T

Y T

tr YLYR Y

tr YMY = Let [ ]1 2, , , ,T

dY z z z= then,

( ) 1

1

dTi i

id

Ti i

i

z LzR Y

z M z

=

=

=∑

∑ (7)

Or alternatively,

( )1 1

d dT Ti i i i

i iR Y z M z z Lz

= =

× =∑ ∑ (8)

Take the derivative of Eq. (8), it obtains

( ) ( )( )

1

2i id

Tii i

i

R YLz R Y Mz

z z M z=

∂ = − ×

∂ ∑ (9)

Let ( )

0R Y

z∂

=∂

, it obtains

( )Lz R Y Mz= (10) which means that if z is the critical point of ( )R Y , then z

must be the generalized eigenvector of Eq. (10) and ( )R Y is the corresponding generalized eigenvalue. Consequently,

( )*

1max

d

ii

R Y λ=

= ∑ , where 1 2, , , dλ λ λ are the first d

largest generalized eigenvalues, and then each row of *Y are the generalized eigenvector correspondingly.

In addition, GPLLE can approximately estimate the intrinsic dimensionality of the data that preserves the most important information, and reduces the influence of noise

536536532532532

and outliers. Considering that ( )R Y equals to the sum of the first d largest eigenvalues. As d increases, the value of

( )R Y also increases, then dΨ in Eq. (3) increases, vice versa. That is to say, the problem of estimating appropriate d turns to compute dΨ . In detail, just like that of the PCA approach to find the intrinsic dimensionality based on computing the sum of the first d largest eigenvalues of the data covariance matrix that reflects the information preserving in the sense of reconstruction errors, GPLLE uses Eq. (3) to estimate the intrinsic dimensionality d, where dΨ changes from 0 to 1, reflecting the proportion of information preserving. Notice that it is practicable to do like this. The reason is that, given a large value τ , say 0.9, it means that

dΨ preserves 90% information of the manifold structure, i.e. the data embedded in the corresponding d dimensional space preserves most of the information of the manifold structure in high dimensional space. In other words, the d determined by Eq. (3) (τ =0.9) is believable, thereby can be regarded as the dimensionality of the true manifold structure. In contrast, with small value of τ , say 0.1, the determined d here is unbelievable, which lost most of the information of the manifold structure.

III. EXPERIMENT RESULTS In this paper, MNIST handwritten digit set [7] is selected

to evaluate the performance of our proposed algorithm. MNIST data set consists of handwritten digits 0-9 (60,000 training samples, 10,000 test samples). Each digit has been size-normalized and centered in 28 28× pixels grayscale image.

A. Estimation of intrinsic dimensionality In the experiments, the intrinsic dimensionality is

calculated by GPLLE on MNIST data set, with the number of training set varying from 1000 to 10000. In addition, in order to observe the impact of different neighborhood parameter k both in LLE and GPLLE, k is set to 5, 10 and 15 respectively. It can be seen in Figure 1, as the number of training set increases, d also increases, which reveals that the manifold structure becomes more and more complicated. Moreover, it can be found that with different k, d is also different, especially when k=15, GPLLE gets larger d but achieves almost the same mapping quality (the results of mapping quality will be shown in Section Ⅲ Part B). In other words, k=15 is not available. From this angle, it can be regarded as a new evaluation criterion for the selection of k. Notice that τ = 0.9 in the experiments.

Figure 1. The dimensionality d, where the high dimensional data embedded, is determined by GPLLE with the neighborhood parameter k in GPLLE been assigned to 5, 10, and 15 respectively.

(a) (b) (c)

(d) (e) (f)

Figure 2. Recognition rates of KNN classifier on the MNIST data set, with the dimensionality of the original digit images reduced by PCA, LLE and GPLLE, respectively. (a)-(c) Recognition rates on training set, with the neighborhood parameter k in LLE and GPLLE been assigned to 5, 10, and 15 respectively; (d)-(f) Recognition rates on test set, corresponding to (a)-(c) respectively.

537537533533533

B. Classification performance There are many different quantitative measures to

estimate the performance of the mapping methods. In this paper, classification performance on training set and test set is adopted, and a simple but attractive classifier, KNN classifier is used, with K=1, 3, 5, 7, 9 (the classification results shown in Figure 2 is the average result of KNN classifier with different K). Figure 2 shows the classification performances of KNN, LLE+KNN, GPLLE+KNN, and PCA+KNN (as a reference). Notice that the test samples are embedded into the same low dimensional space by the linear generation method [8]. From the results shown in Figure 2, with different neighborhood parameter k (=5, 10, 15), LLE and GPLLE receives almost the same classification performances, while d varies visibly. Comparatively, k=10 is more suitable as the neighborhood parameter in LLE and GPLLE. Moreover, in most cases, GPLLE can receive better classification performances than LLE on both training set and test set. In addition, it can be found that, when the number of training set varies from 4000 to 10000, GPLLE achieves about the same classification results, which reveals that with sufficient samples, GPLLE will receive stable results based on the estimated d, and thereby the estimation of d by GPLLE is convincible. By the way, it is clearly that PCA always performs better than GPLLE and LLE on training set, but worse on test set, which indicates that the generalization of PCA is less than GPLLE and LLE, when d is small. That is, d PCs produce large reconstruction errors.

IV. CONCLUSIONS In this paper, a new method called GPLLE is proposed. It

not only preserves the neighborhood of the samples, but also

keeps distant samples still far away. Moreover, it can estimate the intrinsic dimensionality d of the manifold structure. As can be seen in most cases, GPLLE performs better than LLE, and with sufficient data, GPLLE can achieve stable classification results according to the estimated d. In addition, d can be regarded as a new evaluation criterion for the selection of neighborhood parameter k in LLE and GPLLE.

ACKNOWLEDGMENT This work is supported by the National Natural Science

Foundation of China under Grant No. 60802055 and No. 60835001.

REFERENCES [1] I. T. Jolliffe, Principal Component Analysis. Springer, NY (1989) [2] T. Cox, M. Cox, Multidimensional Scalling. Chapman & Hall,

London (1994) [3] X. He and P. Niyogi. Locality preserving projections. In NIPS 16,

Vancouver, Canada (2003) [4] J. B. Tenenbaum, V. de Silva, J. C. Langford, A global geometric

framework for nonlinear dimensionality reduction. Science 290: 2319-2323 (2000)

[5] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS 15. Vancouver, Canada (2001)

[6] S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290: 2323-2326 (2000)

[7] http://yann.lecun.com/exdb/mnist/ [8] L. K. Saul, S. T. Roweis. Think globally, fit locally: unsupervised

learning of nonlinear manifolds. Journal of Machine Learning Research, 4:119-155 (2003).

538538534534534

[ieee 2010 20th international conference on pattern recognition (icpr) - istanbul, turkey...

Documents