dimensionality reduction of face order problem using non ...€¦ · dimensionality reduction of...

1
Dimensionality Reduction of Face Order Problem Using Non - linear Embedding Methods SUN, Jing ; [email protected], MAE, HKUST LUO, Shuang ; [email protected], MAE, HKUST YU, Zhijie; [email protected] , CIVL, HKUST References 1.Tenenbaum J B, De Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction[J]. science, 2000, 290(5500): 2319-2323. 2. Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding[J]. science, 2000, 290(5500): 2323-2326. 3. De la Porte J, Herbst BM, Hereman W, Van Der Walt, S J. An introduction to diffusion maps. . 2008:15-25. YU Zhijie conducted the diffusion map analysis. SUN Jing and LUO Shuang conducted Isomap, MDS, and LLE algorithm and analyzed the data. All the group members designed poster and contributed to the contents. Acknowlegements The exploratory data analysis and visualization are of great importance in many areas of science and engineering. While in these areas, large amounts of data requires to be analyzed. Therefore, a deep understanding of the dimensionality reduction is needed in practical applications. The images of a person’s face observed under different pose and lighting conditions can be considered as points in a high-dimensional vector space. In order to judge the similarities and detect the differences of these images, the dimensionality reduction is required. In this project, we order the faces by using the four algothrims, i.e., Diffusion map, MDS-embedding , ISOMAP-embedding and LLE-embedding. It is found that all the four algothrims can effectively capture the Euclidean structure of the dataset. Besides, the ISOMAP is also capable of discovering the nonlinear degrees of freedom of the dataset. Introduction Methods and Materials ISOMAP-embedding: ISOMAP is an extension of MDS, where pairwise Euclidean distance data points are replaced by geodesic distances, computed by graph shortest path distances. In this algorithm, first, the graph shortest path distances is computed. Second, is computed. Third, the eigenvalue decomposition is computed. Last, the top d nonzero eigenvalues and corresponding eigenvectors are computed. LLE-embedding: This algorithm assumes that any data point in a high dimensional ambient space can be a linear combination of data points in its neighborhood. It is a local method as it involves data points in local neighbors and hence a sparse eigenvector decomposition. Methods and Materials In this dimension reduction problem of ordering the faces, different algorithms including linear embedding (MDS) and nonlinear embedding methods(Diffsion map, isomap, LLE) are explored. As an extension of the MDS method, the Isomap captures the geometric distances and can well describe the high dimensional problem, which is superior to the MDS methods which only capture two intrinsic features. The LLE method uses the neighbouring data to reconstruct low dimensional embedding and such methods can offer information of global geometry. Diffusion map as another nonlinear method, maps the data to a diffusion space to preserve their diffusion distance. It can well distinguish different images with a relative large time scale parameter. In this specific face order problem, it seems that the non-linear embedding algorithms can better describe intrinsic structures of the data. Youtube link: https ://youtu.be/0B3ywTWL40g Discussions &Conclusions Data description: The dataset contains 33 faces of the same person (YR112×92×33) in different angles. The pictures of the 33 faces are shown below. To do data analysis, a data matrix , where n=33 and p=10304 is created. Diffusion map: Diffusion map is a non-linear dimensionality reduction technique. The main idea is to map coordinates between data and diffusion space to reorganize data according to the diffusion metric. In the diffusion space the Euclidean distance approximates the diffusion distance thus the dataset’s intrinsic underlying geometry can be preserved while reducing dimensionality. MDS embedding: MDS is used to translate information about the pairwise distance among a set of n individuals into a configuration of n points mapped into the Cartesian space. This algorithm places each object into N-dimensional space. The goal of MDS is to find I vectors R N and then find an embedding from the I objects into R N . In the classical MDS algothrim, first, , where H is a centering matrix is computed. Second, Eigenvalue decomposition is computed. Third, choose top k nonzero eigenvalues and corresponding eigenvector , where with are computed. Results Diffusion map Fig.3. MDS embedding (inset is eigenvalue plot) MDS embedding Isomap Fig.4. 2-D Isomap embedding LLE Fig.5. 2-D LLE embedding j tk t i x x P j i x x x x d j i 1 1 ) .... ( ... min ]) [ : ( 2 ij T d D H D H K 2 / 1 d d d U Y p n X R Fig.1. 33 faces of the same person in the dataset 2 / T H D H B 2 / 1 ~ k k k U X ) ......, , dig( and ], , ....., [ 1 1 k k n k k k R u u u U 0 ... 2 1 k In MDS, the first two components are used because the first two components explain roughly 80% of the total variance, as indicated by the inset. It is seen that MDS can effectively capture the Euclidean structure of the faces. However, it fails to detect the intrinsic three dimensionality of the face dataset. It is seen from Fig. 4 that the Isomap embedding can correctly detects the three degrees of freedom of the face images, while the MDS embedding shown in the previous Fig. 3 can not. In the other words, Isomap has captured the data’s perceptually relevant structure. Therefore, the Isomap algorithm can manipulate high dimensional observations in terms of their intrinsic nonlinear degrees of freedom. To be specific, the residual variance decreased with the increase of the dimensionality, and the selected 5 neighbor can be detected in Fig.4. The selected faced of this embedding is also shown in Fig.4. The LLE algorithm embedded higher-dimensional data to lower dimensions by reconstructing the data from the neighboring data. In this specific face order problem, we selected 5 neighbors. We get the embedded results by varying the max embedding dimensionality(dmax). For example, dmax chosen as 2,3,4 embedding dimensionalities are plotted in Fig .5. When dmax=2, the embedding the shows a linear trend and the faces are ordered mainly based on the angles. And the plot with dmax=3 shows a similar trend. When dmax=4, the variation between these faces are maximized and the similar pictures clustered. t=1 t=5 t=30 Fig.2(a). Ordered faces with different time scale along first diffusion coordinate Fig.2(b). Diffusion map embedding(t=30) In diffusion map, time scale parameter t highly influences the results of this method. Large t can better explore the intrinsic similarities among different images but require more computational cost. As is shown in the left figure, for t=1 the faces are not reasonably ordered based on the angles. When t increases to 5, they are much better ordered. Along the second diffusion coordinate, no clear meaning of the ordered faces can be found yet.

Upload: others

Post on 16-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dimensionality Reduction of Face Order Problem Using Non ...€¦ · Dimensionality Reduction of Face Order Problem Using Non-linear Embedding Methods SUN, Jing ; jsunav@connect.ust.hk,

Dimensionality Reduction of Face Order Problem Using Non-linear Embedding Methods

SUN, Jing ; [email protected], MAE, HKUST

LUO, Shuang; [email protected], MAE, HKUST

YU, Zhijie; [email protected], CIVL, HKUST

References

1.Tenenbaum J B, De Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction[J]. science, 2000, 290(5500): 2319-2323.

2. Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding[J]. science, 2000, 290(5500): 2323-2326.3. De la Porte J, Herbst BM, Hereman W, Van Der Walt, S J. An introduction to diffusion maps. . 2008:15-25.

YU Zhijie conducted the diffusion map analysis. SUN Jing and LUO Shuangconducted Isomap, MDS, and LLE algorithm and analyzed the data. All thegroup members designed poster and contributed to the contents.

Acknowlegements

The exploratory data analysis and visualization are of great importance inmany areas of science and engineering. While in these areas, large amountsof data requires to be analyzed. Therefore, a deep understanding of thedimensionality reduction is needed in practical applications. The images of aperson’s face observed under different pose and lighting conditions can beconsidered as points in a high-dimensional vector space. In order to judge thesimilarities and detect the differences of these images, the dimensionalityreduction is required.In this project, we order the faces by using the four algothrims, i.e., Diffusionmap, MDS-embedding , ISOMAP-embedding and LLE-embedding. It is foundthat all the four algothrims can effectively capture the Euclidean structure ofthe dataset. Besides, the ISOMAP is also capable of discovering the nonlineardegrees of freedom of the dataset.

Introduction

Methods and Materials

ISOMAP-embedding: ISOMAP is an extension of MDS, where pairwiseEuclidean distance data points are replaced by geodesic distances,computed by graph shortest path distances. In this algorithm, first, thegraph shortest path distances is computed.Second, is computed. Third, the eigenvaluedecomposition is computed. Last, the top d nonzero eigenvalues andcorresponding eigenvectors are computed.LLE-embedding: This algorithm assumes that any data point in a highdimensional ambient space can be a linear combination of data points inits neighborhood. It is a local method as it involves data points in localneighbors and hence a sparse eigenvector decomposition.

Methods and Materials

In this dimension reduction problem of ordering the faces, differentalgorithms including linear embedding (MDS) and nonlinear embeddingmethods(Diffsion map, isomap, LLE) are explored. As an extension of the MDSmethod, the Isomap captures the geometric distances and can well describethe high dimensional problem, which is superior to the MDS methods whichonly capture two intrinsic features. The LLE method uses the neighbouringdata to reconstruct low dimensional embedding and such methods can offerinformation of global geometry. Diffusion map as another nonlinear method,maps the data to a diffusion space to preserve their diffusion distance. It canwell distinguish different images with a relative large time scale parameter. Inthis specific face order problem, it seems that the non-linear embeddingalgorithms can better describe intrinsic structures of the data.

Youtube link: https://youtu.be/0B3ywTWL40g

Discussions &Conclusions

Data description: The dataset contains 33 faces of the same person(Y∈R112×92×33) in different angles. The pictures of the 33 faces areshown below. To do data analysis, a data matrix , where n=33 andp=10304 is created.Diffusion map: Diffusion map is a non-linear dimensionality reductiontechnique. The main idea is to map coordinates between data anddiffusion space to reorganize data according to the diffusion metric. In thediffusion space the Euclidean distance approximates the diffusion distancethus the dataset’s intrinsic underlying geometry can be preserved whilereducing dimensionality.MDS embedding: MDS is used to translate information about thepairwise distance among a set of n individuals into a configuration of npoints mapped into the Cartesian space. This algorithm places each objectinto N-dimensional space. The goal of MDS is to find I vectors ∈RN andthen find an embedding from the I objects into RN. In the classical MDSalgothrim, first, , where H is a centering matrix is computed.Second, Eigenvalue decomposition is computed. Third, choose top knonzero eigenvalues and corresponding eigenvector , where

with are computed.

Results

Diffusion map

Fig.3. MDS embedding (inset is eigenvalue plot)

MDS embedding

Isomap

Fig.4. 2-D Isomap embedding

LLE

Fig.5. 2-D LLE embedding

jtktixxP

ji xxxxdji

11)....(

...min

])[:( 2

ij

T dDHDHK

2/1

ddd UY

pnX R

Fig.1. 33 faces of the same person in the dataset

2/THDHB

2/1~

kkk UX

) ......, ,dig( and ],,.....,[ 11 kk

n

kkk RuuuU 0 ...21 kIn MDS, the first twocomponents are usedbecause the first twocomponents explain roughly80% of the total variance, asindicated by the inset. It isseen that MDS caneffectively capture theEuclidean structure of thefaces. However, it fails todetect the intrinsic threedimensionality of the facedataset.

It is seen from Fig. 4 that theIsomap embedding can correctlydetects the three degrees offreedom of the face images, whilethe MDS embedding shown in theprevious Fig. 3 can not. In theother words, Isomap has capturedthe data’s perceptually relevantstructure.Therefore, the Isomap algorithmcan manipulate high dimensionalobservations in terms of theirintrinsic nonlinear degrees offreedom.To be specific, the residualvariance decreased with theincrease of the dimensionality, andthe selected 5 neighbor can bedetected in Fig.4. The selectedfaced of this embedding is alsoshown in Fig.4.

The LLE algorithm embeddedhigher-dimensional data to lowerdimensions by reconstructing thedata from the neighboring data. Inthis specific face order problem,we selected 5 neighbors. We getthe embedded results by varyingthe max embeddingdimensionality(dmax). Forexample, dmax chosen as 2,3,4embedding dimensionalities areplotted in Fig .5. When dmax=2,the embedding the shows a lineartrend and the faces are orderedmainly based on the angles. Andthe plot with dmax=3 shows asimilar trend. When dmax=4, thevariation between these faces aremaximized and the similar picturesclustered.

t=1

t=5

t=30

Fig.2(a). Ordered faces with different time scale along first diffusion coordinate

Fig.2(b). Diffusion map embedding(t=30)

In diffusion map, time scaleparameter t highly influencesthe results of this method.Large t can better explorethe intrinsic similaritiesamong different images butrequire more computationalcost. As is shown in the leftfigure, for t=1 the faces arenot reasonably orderedbased on the angles. When tincreases to 5, they aremuch better ordered.Along the second diffusioncoordinate, no clear meaningof the ordered faces can befound yet.