dimensionality reduction of face order problem using non ...€¦ · dimensionality reduction of...
TRANSCRIPT
Dimensionality Reduction of Face Order Problem Using Non-linear Embedding Methods
SUN, Jing ; [email protected], MAE, HKUST
LUO, Shuang; [email protected], MAE, HKUST
YU, Zhijie; [email protected], CIVL, HKUST
References
1.Tenenbaum J B, De Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction[J]. science, 2000, 290(5500): 2319-2323.
2. Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding[J]. science, 2000, 290(5500): 2323-2326.3. De la Porte J, Herbst BM, Hereman W, Van Der Walt, S J. An introduction to diffusion maps. . 2008:15-25.
YU Zhijie conducted the diffusion map analysis. SUN Jing and LUO Shuangconducted Isomap, MDS, and LLE algorithm and analyzed the data. All thegroup members designed poster and contributed to the contents.
Acknowlegements
The exploratory data analysis and visualization are of great importance inmany areas of science and engineering. While in these areas, large amountsof data requires to be analyzed. Therefore, a deep understanding of thedimensionality reduction is needed in practical applications. The images of aperson’s face observed under different pose and lighting conditions can beconsidered as points in a high-dimensional vector space. In order to judge thesimilarities and detect the differences of these images, the dimensionalityreduction is required.In this project, we order the faces by using the four algothrims, i.e., Diffusionmap, MDS-embedding , ISOMAP-embedding and LLE-embedding. It is foundthat all the four algothrims can effectively capture the Euclidean structure ofthe dataset. Besides, the ISOMAP is also capable of discovering the nonlineardegrees of freedom of the dataset.
Introduction
Methods and Materials
ISOMAP-embedding: ISOMAP is an extension of MDS, where pairwiseEuclidean distance data points are replaced by geodesic distances,computed by graph shortest path distances. In this algorithm, first, thegraph shortest path distances is computed.Second, is computed. Third, the eigenvaluedecomposition is computed. Last, the top d nonzero eigenvalues andcorresponding eigenvectors are computed.LLE-embedding: This algorithm assumes that any data point in a highdimensional ambient space can be a linear combination of data points inits neighborhood. It is a local method as it involves data points in localneighbors and hence a sparse eigenvector decomposition.
Methods and Materials
In this dimension reduction problem of ordering the faces, differentalgorithms including linear embedding (MDS) and nonlinear embeddingmethods(Diffsion map, isomap, LLE) are explored. As an extension of the MDSmethod, the Isomap captures the geometric distances and can well describethe high dimensional problem, which is superior to the MDS methods whichonly capture two intrinsic features. The LLE method uses the neighbouringdata to reconstruct low dimensional embedding and such methods can offerinformation of global geometry. Diffusion map as another nonlinear method,maps the data to a diffusion space to preserve their diffusion distance. It canwell distinguish different images with a relative large time scale parameter. Inthis specific face order problem, it seems that the non-linear embeddingalgorithms can better describe intrinsic structures of the data.
Youtube link: https://youtu.be/0B3ywTWL40g
Discussions &Conclusions
Data description: The dataset contains 33 faces of the same person(Y∈R112×92×33) in different angles. The pictures of the 33 faces areshown below. To do data analysis, a data matrix , where n=33 andp=10304 is created.Diffusion map: Diffusion map is a non-linear dimensionality reductiontechnique. The main idea is to map coordinates between data anddiffusion space to reorganize data according to the diffusion metric. In thediffusion space the Euclidean distance approximates the diffusion distancethus the dataset’s intrinsic underlying geometry can be preserved whilereducing dimensionality.MDS embedding: MDS is used to translate information about thepairwise distance among a set of n individuals into a configuration of npoints mapped into the Cartesian space. This algorithm places each objectinto N-dimensional space. The goal of MDS is to find I vectors ∈RN andthen find an embedding from the I objects into RN. In the classical MDSalgothrim, first, , where H is a centering matrix is computed.Second, Eigenvalue decomposition is computed. Third, choose top knonzero eigenvalues and corresponding eigenvector , where
with are computed.
Results
Diffusion map
Fig.3. MDS embedding (inset is eigenvalue plot)
MDS embedding
Isomap
Fig.4. 2-D Isomap embedding
LLE
Fig.5. 2-D LLE embedding
jtktixxP
ji xxxxdji
11)....(
...min
])[:( 2
ij
T dDHDHK
2/1
ddd UY
pnX R
Fig.1. 33 faces of the same person in the dataset
2/THDHB
2/1~
kkk UX
) ......, ,dig( and ],,.....,[ 11 kk
n
kkk RuuuU 0 ...21 kIn MDS, the first twocomponents are usedbecause the first twocomponents explain roughly80% of the total variance, asindicated by the inset. It isseen that MDS caneffectively capture theEuclidean structure of thefaces. However, it fails todetect the intrinsic threedimensionality of the facedataset.
It is seen from Fig. 4 that theIsomap embedding can correctlydetects the three degrees offreedom of the face images, whilethe MDS embedding shown in theprevious Fig. 3 can not. In theother words, Isomap has capturedthe data’s perceptually relevantstructure.Therefore, the Isomap algorithmcan manipulate high dimensionalobservations in terms of theirintrinsic nonlinear degrees offreedom.To be specific, the residualvariance decreased with theincrease of the dimensionality, andthe selected 5 neighbor can bedetected in Fig.4. The selectedfaced of this embedding is alsoshown in Fig.4.
The LLE algorithm embeddedhigher-dimensional data to lowerdimensions by reconstructing thedata from the neighboring data. Inthis specific face order problem,we selected 5 neighbors. We getthe embedded results by varyingthe max embeddingdimensionality(dmax). Forexample, dmax chosen as 2,3,4embedding dimensionalities areplotted in Fig .5. When dmax=2,the embedding the shows a lineartrend and the faces are orderedmainly based on the angles. Andthe plot with dmax=3 shows asimilar trend. When dmax=4, thevariation between these faces aremaximized and the similar picturesclustered.
t=1
t=5
t=30
Fig.2(a). Ordered faces with different time scale along first diffusion coordinate
Fig.2(b). Diffusion map embedding(t=30)
In diffusion map, time scaleparameter t highly influencesthe results of this method.Large t can better explorethe intrinsic similaritiesamong different images butrequire more computationalcost. As is shown in the leftfigure, for t=1 the faces arenot reasonably orderedbased on the angles. When tincreases to 5, they aremuch better ordered.Along the second diffusioncoordinate, no clear meaningof the ordered faces can befound yet.