spectral multidimensional scaling - pnas · the computational complexity of multidimensional...

6
Spectral multidimensional scaling Yonathan Aalo a,1 and Ron Kimmel b Departments of a Electrical Engineering and b Computer Science, TechnionIsrael Institute of Technology, Haifa 32000, Israel Edited* by Haim Brezis, Rutgers, The State University of New Jersey, Piscataway, NJ, and approved September 10, 2013 (received for review May 8, 2013) An important tool in information analysis is dimensionality reduc- tion. There are various approaches for large data simplication by scaling its dimensions down that play a signicant role in recognition and classication tasks. The efciency of dimension reduction tools is measured in terms of memory and computational complexity, which are usually a function of the number of the given data points. Sparse local operators that involve substantially less than quadratic com- plexity at one end, and faithful multiscale models with quadratic cost at the other end, make the design of dimension reduction procedure a delicate balance between modeling accuracy and efciency. Here, we combine the benets of both and propose a low-dimensional multiscale modeling of the data, at a modest computational cost. The idea is to project the classical multidimensional scaling problem into the data spectral domain extracted from its LaplaceBeltrami operator. There, embedding into a small dimensional Euclidean space is accomplished while optimizing for a small number of coefcients. We provide a theoretical support and demonstrate that working in the natural eigenspace of the data, one could reduce the process complexity while maintaining the model delity. As examples, we efciently canonize nonrigid shapes by embedding their intrinsic metric into R 3 , a method often used for matching and classifying almost isometric articulated objects. Finally, we demonstrate the method by exposing the style in which handwritten digits appear in a large collection of images. We also visualize clustering of digits by treating images as feature points that we map to a plane. at embedding | distance maps | big data | diffusion geometry M anifold learning refers to the process of mapping given data into a simple low-dimensional domain that reveals proper- ties of the data. When the target space is Euclidean, the procedure is also known as attening. The question of how to efciently and reliably atten given data is a challenge that occupies the minds of numerous researchers. Distance-preserving data-attening pro- cedures can be found in the elds of geometry processing, the mapmaker problem through exploration of biological surfaces (13), texture mapping in computer graphics (4, 5), nonrigid shape analysis (6), image (79) and video understanding (7, 10), and computational biometry (11), to name just a few. The at em- bedding is usually a simplication process that aims to preserve, as much as possible, distances between data points in the original space, while being efcient to compute. One family of attening techniques is multidimensional scaling (MDS), which attempts to map all pairwise distances between data points into small di- mensional Euclidean domains. Review of MDS applications in psychophysics can be found in ref. 12, which includes the com- putational realization that human color perception is 2D. A proper way to explore the geometry of given data points involves computing all pairwise distances. Then, a attening procedure should attempt to keep the distance between all cou- ples of corresponding points in the low dimensional Euclidean domain. Computing pairwise distances between points was addressed by extension of local distances on the data graph (1, 13), or by consistent numerical techniques for distance computation on surfaces (14). The complexity involved in storing all pairwise distances is quadratic in the number of data points, which is a limiting factor in large databases. Alternative models try to keep the size of the input as low as possible by limiting the input to distances of just nearby points. One such example is locally linear embedding (15), which attempts to map data points into a at domain where each feature coordinate is a linear combination of the coordinates of its neigh- bors. The minimization, in this case, is for keeping the combina- tion similar in the given data and the target a at domain. Because only local distances are analyzed, the pairwise distances matrix is sparse, with an effective size of OðnÞ, where n is the number of data points. Along the same line, the Hessian locally linear em- bedding (16) tries to better t the local geometry of the data to the plane. Belkin and Niyogi (17) suggested embedding data points into the LaplaceBeltrami eigenspace for the purpose of data clustering. The idea is, yet again, to use distances of only nearby points by which a LaplaceBeltrami operator (LBO) is dened. The data can then be projected onto the rst eigenvectors that correspond to the smallest eigenvalues of the LBO. The question of how to exploit the LBO decomposition to construct diffusion geometry was addressed by Coifman and coworkers (18, 19). De Silva and Tenenbaum (20) recognized the computational difculty of dealing with a full matrix of pairwise distances, and proposed working with a subset of landmarks, which is used for interpolation in the embedded space. Next, Bengio et al. (21) proposed to extend subsets by Nystrөm extrapolation, within a space that is an empirical Hilbert space of functions obtained through kernels that represent probability distributions; they did not incorporate the geometry of the original data. Finally, assuming n data points, Asano et al. (22) proposed reducing both memory and computational complexity by subdividing the prob- lem into Oð ffiffiffi n p Þ subsets of Oð ffiffiffi n p Þ data points in each. That reduction may be feasible when distances between data points can be computed in constant time, which is seldom the case. The computational complexity of multidimensional scaling was addressed by a multigrid approach in ref. 23, and vector extrapolation techniques in ref. 24. In both cases the acceler- ation, although effective, required all pairwise distances, an Oðn 2 Þ Signicance Scientic theories describe observations by equations with a small number of parameters or dimensions. Memory and com- putational efciency of dimension reduction procedures is a function of the size of the observed data. Sparse local oper- ators that involve almost linear complexity, and faithful multi- scale models with quadratic cost, make the design of dimension reduction tools a delicate balance between modeling accuracy and efciency. Here, we propose multiscale modeling of the data at a modest computational cost. We project the classical multidimensional scaling problem into the data spectral domain. Then, embedding into a low-dimensional space is efciently ac- complished. Theoretical support and empirical evidence demon- strate that working in the natural eigenspace of the data, one could reduce the complexity while maintaining model delity. Author contributions: Y.A. and R.K. designed research and wrote the paper. The authors declare no conict of interest. *This Direct Submission article had a prearranged editor. Freely available online through the PNAS open access option. See Commentary on page 18031. 1 To whom correspondence should be addressed. E-mail: ya[email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1308708110/-/DCSupplemental. 1805218057 | PNAS | November 5, 2013 | vol. 110 | no. 45 www.pnas.org/cgi/doi/10.1073/pnas.1308708110 Downloaded by guest on March 28, 2021

Upload: others

Post on 19-Oct-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spectral multidimensional scaling - PNAS · The computational complexity of multidimensional scaling was addressed by a multigrid approach in ref. 23, and vector extrapolation techniques

Spectral multidimensional scalingYonathan Aflaloa,1 and Ron Kimmelb

Departments of aElectrical Engineering and bComputer Science, Technion–Israel Institute of Technology, Haifa 32000, Israel

Edited* by Haim Brezis, Rutgers, The State University of New Jersey, Piscataway, NJ, and approved September 10, 2013 (received for review May 8, 2013)

An important tool in information analysis is dimensionality reduc-tion. There are various approaches for large data simplification byscaling its dimensions down that play a significant role in recognitionand classification tasks. The efficiency of dimension reduction tools ismeasured in terms ofmemory and computational complexity, whichare usually a function of the number of the given data points. Sparselocal operators that involve substantially less than quadratic com-plexity at one end, and faithfulmultiscalemodelswith quadratic costat the other end, make the design of dimension reduction procedurea delicate balance between modeling accuracy and efficiency. Here,we combine the benefits of both and propose a low-dimensionalmultiscale modeling of the data, at a modest computational cost.The idea is to project the classical multidimensional scaling probleminto the data spectral domain extracted from its Laplace–Beltramioperator. There, embedding into a small dimensional Euclidean spaceis accomplished while optimizing for a small number of coefficients.We provide a theoretical support and demonstrate that working inthe natural eigenspace of the data, one could reduce the processcomplexity while maintaining the model fidelity. As examples, weefficiently canonize nonrigid shapes by embedding their intrinsicmetric into R3, a method often used for matching and classifyingalmost isometric articulated objects. Finally, we demonstrate themethod by exposing the style in which handwritten digits appearin a large collection of images. We also visualize clustering of digitsby treating images as feature points that we map to a plane.

flat embedding | distance maps | big data | diffusion geometry

Manifold learning refers to the process of mapping given datainto a simple low-dimensional domain that reveals proper-

ties of the data.When the target space is Euclidean, the procedureis also known as flattening. The question of how to efficiently andreliably flatten given data is a challenge that occupies the minds ofnumerous researchers. Distance-preserving data-flattening pro-cedures can be found in the fields of geometry processing, themapmaker problem through exploration of biological surfaces(1–3), texture mapping in computer graphics (4, 5), nonrigid shapeanalysis (6), image (7–9) and video understanding (7, 10), andcomputational biometry (11), to name just a few. The flat em-bedding is usually a simplification process that aims to preserve, asmuch as possible, distances between data points in the originalspace, while being efficient to compute. One family of flatteningtechniques is multidimensional scaling (MDS), which attempts tomap all pairwise distances between data points into small di-mensional Euclidean domains. Review of MDS applications inpsychophysics can be found in ref. 12, which includes the com-putational realization that human color perception is 2D.A proper way to explore the geometry of given data points

involves computing all pairwise distances. Then, a flatteningprocedure should attempt to keep the distance between all cou-ples of corresponding points in the low dimensional Euclideandomain. Computing pairwise distances between points wasaddressed by extension of local distances on the data graph (1, 13),or by consistent numerical techniques for distance computationon surfaces (14). The complexity involved in storing all pairwisedistances is quadratic in the number of data points, which is alimiting factor in large databases. Alternative models try to keepthe size of the input as low as possible by limiting the input todistances of just nearby points.

One such example is locally linear embedding (15), whichattempts to map data points into a flat domain where each featurecoordinate is a linear combination of the coordinates of its neigh-bors. The minimization, in this case, is for keeping the combina-tion similar in the given data and the target a flat domain. Becauseonly local distances are analyzed, the pairwise distances matrix issparse, with an effective size of OðnÞ, where n is the number ofdata points. Along the same line, the Hessian locally linear em-bedding (16) tries to better fit the local geometry of the data to theplane. Belkin and Niyogi (17) suggested embedding data pointsinto the Laplace–Beltrami eigenspace for the purpose of dataclustering. The idea is, yet again, to use distances of only nearbypoints by which a Laplace–Beltrami operator (LBO) is defined.The data can then be projected onto the first eigenvectors thatcorrespond to the smallest eigenvalues of the LBO. The questionof how to exploit the LBO decomposition to construct diffusiongeometry was addressed by Coifman and coworkers (18, 19).De Silva and Tenenbaum (20) recognized the computational

difficulty of dealing with a full matrix of pairwise distances, andproposed working with a subset of landmarks, which is used forinterpolation in the embedded space. Next, Bengio et al. (21)proposed to extend subsets by Nystrөm extrapolation, withina space that is an empirical Hilbert space of functions obtainedthrough kernels that represent probability distributions; theydid not incorporate the geometry of the original data. Finally,assuming n data points, Asano et al. (22) proposed reducing bothmemory and computational complexity by subdividing the prob-lem into Oð ffiffiffi

np Þ subsets of Oð ffiffiffi

np Þ data points in each. That

reduction may be feasible when distances between data points canbe computed in constant time, which is seldom the case.The computational complexity of multidimensional scaling

was addressed by a multigrid approach in ref. 23, and vectorextrapolation techniques in ref. 24. In both cases the acceler-ation, although effective, required all pairwise distances, an Oðn2Þ

Significance

Scientific theories describe observations by equations with asmall number of parameters or dimensions. Memory and com-putational efficiency of dimension reduction procedures is afunction of the size of the observed data. Sparse local oper-ators that involve almost linear complexity, and faithful multi-scale models with quadratic cost, make the design of dimensionreduction tools a delicate balance between modeling accuracyand efficiency. Here, we propose multiscale modeling of thedata at a modest computational cost. We project the classicalmultidimensional scaling problem into the data spectral domain.Then, embedding into a low-dimensional space is efficiently ac-complished. Theoretical support and empirical evidence demon-strate that working in the natural eigenspace of the data, onecould reduce the complexity while maintaining model fidelity.

Author contributions: Y.A. and R.K. designed research and wrote the paper.

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

Freely available online through the PNAS open access option.

See Commentary on page 18031.1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplemental.

18052–18057 | PNAS | November 5, 2013 | vol. 110 | no. 45 www.pnas.org/cgi/doi/10.1073/pnas.1308708110

Dow

nloa

ded

by g

uest

on

Mar

ch 2

8, 2

021

Page 2: Spectral multidimensional scaling - PNAS · The computational complexity of multidimensional scaling was addressed by a multigrid approach in ref. 23, and vector extrapolation techniques

input. An alternative approach to deal with the memory com-plexity was needed.In ref. 25, geodesics that are suspected to distort the embedding

due to topology effects were filtered out in an attempt to reducedistortions. Such filters eliminate some of the pairwise distances,yet not to the extent of substantial computational or memorysavings, because the goal was mainly reduction of flattening dis-tortions. In ref. 26, the eigenfunctions of the LBO on a surfacewere interpolated into the volume bounded by the surface. Thisprocedure was designed to overcome the need to evaluate theLBO inside the volume, as proposed in ref. 27. Both models dealwith either surface or volume LBOdecomposition of objects inR3,and were not designed to flatten and simplify structures in big data.Another dimensionality reduction method, intimately related

to MDS, is the principal component analysis method (PCA). ThePCA procedure projects the points to a low-dimensional spaceby minimizing the least-square fitting error. In ref. 28, that re-lation was investigated through kernel PCA, which is related tothe problem we plan to explore.Here, we use the fact that the gradient magnitude of distance

functions on the manifold is equal to 1 almost everywhere. In-terpolation of such smooth functions could be efficiently ob-tained by projecting the distances into the LBO eigenspace. Infact, translating the problem into its natural spectral domainenables us to reduce the complexity of the flattening procedure.The efficiency is established by considering a small fraction ofthe pairwise distances that are projected onto the LBO leadingeigenfunctions.Roughly speaking, the differential structure of themanifold is cap-

tured by the eigenbasis of the LBO, whereas its multiscale structureis encapsulated by sparse sampling of the pairwise distances matrix.We exemplify this idea by extracting the R4 structure of points inR10;000, canonizing surfaces to cope with nonrigid deformations, andmapping images of handwritten digits to the plane.The proposed framework operates on abstract data in any

dimension. In the first section, we prove the asymptotic behaviorof using the leading eigenvectors of the LBO for representingsmooth functions on a given manifold. Next, we demonstratehow to interpolate geodesic distances by projecting a fraction ofthe distances between data points into the LBO eigenspace. Wenext formulate the classical MDS problem in the LBO eigen-space. A relation to diffusion geometry (18, 19, 29) is established.Experimental results demonstrate the complexity savings.

Spectral ProjectionLet us consider a Riemannian manifold, M, equipped with ametric G. The manifold and its metric induce a LBO, oftendenoted byΔG, and here w.l.o.g. byΔ. The LBO is self-adjoint anddefines a set of functions called eigenfunctions, denoted ϕi, suchthat Δϕi = λiϕi and

RMϕiðxÞϕjðxÞdaðxÞ= δij, where da is an area el-

ement. These functions have been popular in the field of shapeanalysis; they have been used to construct descriptors (30) and lin-early relate between metric spaces (31). In particular, for problemsinvolving functions defined on a triangulated surface, when thenumber of vertices of M is large, we can reduce the dimensionalityof the functional space by considering smooth functions defined ina subspace spanned by just a couple of eigenfunctions of the asso-ciated LBO. To prove the efficiency of LBO eigenfunctions inrepresenting smooth functions, consider a smooth function f withbounded gradient magnitude k∇GfkG. Define the representa-tion residual function rn = f −

Pni= 1hf ;ϕiiGϕi. It is easy to see

that ∀i; 1≤ i≤ n; hrn;ϕiiG = 0, and ∀i; 1≤ i≤ n; h∇Grn;∇GϕiiG =hrn;ΔGϕiiG = λihrn;ϕiiG = 0. Using these properties and assumingthe eigenvalues are ordered in ascending order, λ1 ≤ λ2 . . ., we read-ily have that krnk2G =

��P∞i=n+1 hrn;ϕiiGϕi

��2G=

P∞i= n+ 1 hrn;ϕii2G

and k∇Grnk2G =��P∞

i=n+1hrn;ϕiiG∇Gϕi

��2G≥ λn+1

P∞i= n+ 1hrn;ϕii2G.

Then,k∇Grnk2Gkrnk2G

≥ λn+1. Moreover, as for i∈ f1; ::; ng, the inner

product vanishes, h∇Grn;∇GϕiiG = 0, we have k∇Gfk2G =��∇Grn+Pn

i=1 h f ;ϕiiG∇Gϕi

��2G= k∇Grnk2G +

Pni= 1 h f ;ϕii2Gλi. It follows

that krnk2G ≤k∇Grnk2G

λn+1≤k∇Gfk2Gλn+1

.

Finally, for d dimensional manifolds, as shown in ref. 32, thespectra has a linear behavior in n

2d, that is, λn ≈C1n

2d as n→∞.

Moreover, the residual function rn depends linearly on k∇Gfk,which is bounded by a constant. It follows that rn convergesasymptotically to zero at a rate of O

�n−

2d

�. Moreover, this

convergence depends on k∇Gfk2G, i.e., ∃C2, such that

∀f : M→M krnk2G ≤C2k∇Gfk2G

n2d

: [1]

Spectral InterpolationLet us consider a d dimensional manifold,M, sampled at n pointsfVig, and J a subset of f1; 2; . . . ; ng such that jJ j=ms ≤ n. Givena smooth function f defined on VJ = fVj; j∈J g, our first questionis how to interpolate f, that is, how to construct a continuousfunction ~f such that ~f ðVjÞ= f ðVjÞ; ∀j∈J . One simple solution islinear interpolation. Next, assume the function ~f is required to beas smooth as possible in L2 sense. We measure the smoothness ofa function by Esmoothðf Þ=

RMk∇fk22da. The problem of smooth

interpolation could be rewritten as minh:M→R

EsmoothðhÞ s.t. ∀j∈

J ; hðVjÞ= f ðVjÞ. We readily haveRMk∇hk22da=

RMhΔh; hida. Our

interpolation problem could then be written as

minh:M→R

ZM

hΔh; hida  s:t:  h�Vj�= f

�Vj�

 ∀j∈J : [2]

Any discretization matrix of the LBO could be used for theeigenspace construction. Here, we adopt the LBO general L=A−1W form, where A−1 is a diagonal matrix whose entries areinversely proportional to themetric infinitesimal volume elements.One example is the cotangent weights approximation (33) for tri-angulated surfaces; another example is the generalized discreteLBO suggested in ref. 17. In the first case, W is a matrix in whicheach entry is zero if the two vertices indicated by the two matrixindices do not share a triangle’s edge, or the sum of the cotan-gents of the angles supported by the edge connecting the corre-sponding vertices. In the latter case, W is the graph Laplacian,where the graph is constructed by connecting nearest neighbors.Like any Laplacian matrix, the diagonal is the negative sum ofthe off-diagonal elements of the corresponding row. A, in thetriangulated surface case, is a diagonal matrix in which the Aii thelement is proportional to the area of the triangles about thevertex Vi. A similar normalization factor should apply in the moregeneral case.A discrete version of the smooth interpolation problem 2 can

be rewritten as

minx

xTWx  s:t:  Bx= f ; [3]

where B is the projection matrix on the space spanned by thevectors ej; j∈J , where ej is the jth canonical basis vector, and fnow represents the sampled vector f ðVJ Þ.To reduce the dimensionality of the problem, we introduce f̂ ,

the spectral projection of f to the set of eigenfunctions of theLBO fϕigme

i=1 as f̂ =Pme

i= 1h f ;ϕiiϕi, where Δϕi = λiϕi. Then, bydenoting Φ the matrix whose ith column is ϕi, we have f̂ =Φα,where α is a vector such that αi = hf ;ϕii. Problem 3 can now beapproximated by

Aflalo and Kimmel PNAS | November 5, 2013 | vol. 110 | no. 45 | 18053

APP

LIED

MATH

EMATICS

SEECO

MMEN

TARY

Dow

nloa

ded

by g

uest

on

Mar

ch 2

8, 2

021

Page 3: Spectral multidimensional scaling - PNAS · The computational complexity of multidimensional scaling was addressed by a multigrid approach in ref. 23, and vector extrapolation techniques

minα∈Rme

αTΦTWΦα  s:t:  BΦα= f : [4]

By definition, ΦTWΦ=Λ, where Λ represents the diagonal ma-trix whose elements are the eigenvalues of L. We can alterna-tively incorporate the constraint as a penalty in the target func-

tion, and rewrite our problem as minα∈Rme

�αTΛα+ μkBΦα− fk2

�=

minα∈Rme

ðαTðΛ+ μΦTBTBΦÞα+ 2μf TBΦαÞ. The solution to this

problem is given by

α= 2μ�Λ+ μΦTBTBΦ

�−1ΦTBTf =Mf : [5]

Next, we use the above interpolation expressions to formulate thepairwise geodesics matrix in a compact yet accurate manner. Thesmooth interpolation, problem 2, for a pairwise distances functioncan be defined as follows. In this setting, let I =J ×J be the setof pairs of indices of data points, and FðVi;VjÞ a value defined foreach pair of points (or vertices in the surface example) ðVi;VjÞ,where ði; jÞ∈ I . We would like to interpolate the smooth functionD : M×M→R, whose values are given at ðVi;VjÞ; ði; jÞ∈ I , byDðVi;VjÞ=FðVi;VjÞ; ∀ði; jÞ∈ I . For that goal, we first definea smooth-energy measure for such functions. We proceed as be-fore by introducing

EsmoothðDÞ=Z ZM

k∇xDðx; yÞk2 + ��∇yDðx; yÞ��2daðxÞdaðyÞ:

The smooth interpolation problem can now be written as

minD:M×M→R

EsmoothðDÞs:t: D

�Vi;Vj

�=F

�Vi;Vj

�; ∀ði; jÞ∈ I : [6]

Given any matrix D, such that Dij =DðVi;VjÞ, we have

ZM

���∇xD�x; yj

����2daðxÞ=ZM

�ΔxD

�x; yj

�;D

�x; yj

�daðxÞ

≈DTj WDj = trace

�WDjDT

j

�;

where DTj represents the jth column of D. Then,

Z ZM

k∇xDðx; yÞk2daðxÞdaðyÞ≈Xj

Ajj   trace�WDjDT

j

= trace

W

Xj

DjDTj Ajj

�= trace

�WDADT�:

A similar result applies toRR

M��∇yDðx; yÞ��2daðxÞdaðyÞ. Then,

Z ZM

k∇xDðx; yÞk2daðxÞdaðyÞ≈ trace�DTWDA

�Z Z

M

��∇yDðx; yÞ��2daðxÞdaðyÞ≈ trace�DWDTA

�:

[7]

The smooth energy can now be discretized for a matrix D by

EsmoothðDÞ= trace�DTWDA

�+ trace

�DWDTA

�: [8]

The spectral projection of D onto Φ, is given by

~Dðx; yÞ=Xi

hDð ·; yÞ;ϕiiϕiðxÞ

=Xni= 1

Xnj= 1

DhDðu; vÞ;ϕiðuÞi;ϕjðvÞ

E|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

αij

ϕjðyÞϕiðxÞ ;

where αij =RR

M×MDðx; yÞϕiðxÞϕjðyÞdaðxÞdaðyÞ. In matrix notations,denoting by D the matrix such that Dij = ~Dðxi; yjÞ, we can show that

D=ΦαΦT : [9]

Combining Eqs. 8 and 9, we can define the discrete smooth energyof the spectral projection of a function as~EsmoothðDÞ= trace

�ΦαTΦTWΦαΦTA

�+ trace

�ΦαΦTWΦαTΦTA

�= trace

�ΦαTΛαΦTA

�+ trace

�ΦαΛαTΦTA

�= trace

�αTΛαΦTAΦ

�+ trace

�αΛαTΦTAΦ

�= trace

�αTΛα

�+ trace

�αΛαT

�: [10]

Using the spectral smooth representation introduced in Eq. 9, wecan define the smooth spectral interpolation problem for a func-tion from M×M to R as

minα∈Rme ×me

trace�αTΛα

�+ trace

�αΛαT

�s:t: 

�ΦαΦT�

ij =D�Vi;Vj

�;  ∀ði; jÞ∈ I : [11]

Expressing the constraint again as a penalty function, we end upwith the following optimization problem

minα∈Rme ×me

trace�αTΛα

�+ trace

�αΛαT

�+ μ

Xði;jÞ∈I

����ΦαΦT�ij−D

�Vi;Vj

����2F

; [12]

where k · kF represents the Frobenius norm, and me the numberof eigenfunctions. Problem 12 is a minimization problem of a qua-dratic function of α. Then, representing α as an ðm2

e × 1Þ vector α,the problem can be rewritten as a quadratic programming prob-lem. We can find a matrix M, similar to Eq. 5, such that α=MD,where D is the row stack vector of the matrix DðVi;VjÞ.Another, less accurate but more efficient, approach to obtain

an approximation of the matrix α is to compute

α=MFMT ; [13]

where F is the matrix defined by Fij =FðVi;VjÞ and M is thematrix introduced in Eq. 5.Notice that spectral projection is a natural choice for the

spectral interpolation of distances because the eigenfunctionsencode the manifold geometry, as do distance functions. More-over, the eikonal equation, which models geodesic distancefunctions on the manifold, is defined by k∇GDk= 1. Eq. 1, inthis case, provides us with a clear asymptotic convergence rateby spectral projection of the function D, because k∇GDk is aconstant equal to 1.At this point, we have all of the ingredients to present the

spectral MDS. For simplicity, we limit our discussion to clas-sical scaling.

Spectral MDSMDS is a family of dimensionality reduction techniques that at-tempt to find a simple representation for a dataset given by thedistances between every two data points. Given anymetric spaceMequipped with a metric D : M×M→R, and V = fV1;V2; . . .Vnga finite set of elements of M, the multidimensional scaling of V inRk involves finding a set of point X = fX1;X2; . . . ;Xng inRk whosepairwise Euclidean distances dðXi;XjÞ=

��Xi −Xj��2 are as close as

possible to DðVi;VjÞ for all ði; jÞ. Such an embedding, for the MDS

18054 | www.pnas.org/cgi/doi/10.1073/pnas.1308708110 Aflalo and Kimmel

Dow

nloa

ded

by g

uest

on

Mar

ch 2

8, 2

021

Page 4: Spectral multidimensional scaling - PNAS · The computational complexity of multidimensional scaling was addressed by a multigrid approach in ref. 23, and vector extrapolation techniques

member known as classical scaling, can be realized by the followingminimization program min

X

��XTX−12 JDJ

��F , where D is a matrix

defined by Dij =DðVi;VjÞ2 and J= In − 1n 1n1

Tn , or Jij = δij − 1=n.

Classical scaling (Eq. 12) finds the first k singular vectors and cor-responding singular values of the matrix −1

2 JDJ. The classicalscaling solver requires the computation of a ðn×nÞ matrix of dis-tances, which is a challenging task when dealing with more than,say, thousands of data points. The quadratic size of the input dataimposes severe time and space limitations. Several accelerationmethods for MDS were tested over the years (20–24). However, tothe best of our knowledge, the multiscale geometry of the data wasnever used for lowering the space and time complexities during thereduction process. Here, we show how spectral interpolation, in thecase of classical scaling, is used to overcome these complexitylimitations.In the spectral MDS framework we first select a subset of ms

points, with indices J of the data. For efficient covering of thedata manifold, this set can be selected using the farthest-pointsampling strategy, which is known to be 2-optimal in the sense ofcovering. We then compute the geodesic distances between everytwo points ðVi;VjÞ∈M×M; ði; jÞ∈ I =J ×J .Now, we are ready to choose any Laplacian operator that is

constructed from the local relations between data points. TheLBO takes into consideration the differential geometry of thedata manifold. We compute Laplacian’s first me eigenvectorsthat are arranged in an eigenbasis matrix denoted by Φ. Finally,we extract the spectral interpolation matrix α from the computedgeodesic distances and the eigenbasis Φ using Eqs. 12 or 13.The interpolated matrix distance ~D between every two points

ofM can be computed by ~D=ΦαΦT . It is important to note thatthere is no need to compute this matrix explicitly.

At this point let us justify our choice of representation. Denoteby Dy : M→R+ the geodesic distance function from a sourcepoint y to the rest of the point in the manifold. Geodesic distancefunctions are characterized by the eikonal equation

��∇GDy��2 = 1.

This property can be used in the bound provided by Eq. 1. Spe-

cifically,���∇GD2

y

���2 = ��2Dy∇GDy��2 ≤ 4

��Dy��2. Plugging this re-

lation to Eq. 1, the error of the squared geodesic distance function

projected to the spectral basis is given bykrnk2GkDk2G

≤ 4C2

n2d, where kDkG

is the diameter of the manifold. We thereby obtained a bound onthe relative projection error that depends only on Weyl’s constantand the number of eigenvectors used in the reconstruction byprojection.Next, the spectral MDS solution is given by XXT = − 1

2 J~DJ.

This decomposition can be approximated by two alternativemethods. The first method considers the projection of X to theeigenbasis Φ, which is equivalent to representing X by ~X=Φβ.To that end, we have to solve ΦββTΦT = − 1

2 JΦαΦTJ. BecauseΦTAΦ= Ime , we have

ββT = −12ΦTAJΦ|fflfflfflfflffl{zfflfflfflfflffl}

H

αΦTJAΦ= −12HαHT ; [14]

which leads to the singular value decomposition of the ðme ×meÞmatrix −

12HαHT .

In the second method, to overcome potential distortions causedby ignoring the high-frequency components, one may prefer todecompose the matrixΦαΦT using the algebraic trick presented inalgorithm 1.

Fig. 1. Canonical forms of Armadillo (Left) and Ho-mer (Right). Within each box is (from left to right) theshape, followed by canonical forms obtained by reg-ular MDS, and spectral MDS.

Fig. 2. Spectral MDS compared with MDS with the same number of sampled points (same complexity). Each point in the plane on the central framesrepresents a shape. The input to the MDS producing the position of the points was distances between the corresponding canonical forms (MDS applied to theresult of MDS). Red points represent dogs, and blue points represent lionesses. The spectral MDS result (Right) provides better clustering and separationbetween classes compared with the regular MDS (Left).

Aflalo and Kimmel PNAS | November 5, 2013 | vol. 110 | no. 45 | 18055

APP

LIED

MATH

EMATICS

SEECO

MMEN

TARY

Dow

nloa

ded

by g

uest

on

Mar

ch 2

8, 2

021

Page 5: Spectral multidimensional scaling - PNAS · The computational complexity of multidimensional scaling was addressed by a multigrid approach in ref. 23, and vector extrapolation techniques

Algorithm 1: Spectral MDSRequire: A data manifold M densely sampled at n points,

number of subsampled points ms, number of eigenvectors me.Compute J a subset of ms points sampled from M.Compute the matrix D of squared geodesic distances between

every two points ðpi; pjÞ; i∈J ; j∈J .Compute the matrices Φ;Λ containing the first mth

e eigen-vectors and eigenvalues of the LBO of M.Compute the matrix α according to 12 or 13.Compute the SVD decomposition of the ðn×meÞ matrix JΦ,

such that JΦ=SUVT , where J= In − 1n 1n1

Tn .

Compute the eigendecomposition of the ðme × meÞ matrixUVTαVU, such that UVTαVU =PΓPT .Compute the matrix Q=SPðΓÞ12 such that QQT = JΦαΦTJ.

Return: the first k columns of the matrix Q.Relation to diffusion maps: An interesting flat-embedding op-

tion is known as diffusion maps (18, 29). Given a kernel functionKðλÞ, the diffusion distance between two points ðx; yÞ∈M is de-fined as

D2ðx; yÞ=Xk

ðϕkðxÞ−ϕkðyÞÞ2KðλkÞ; [15]

where ϕk represents the kth eigenfunction of ΔM, and λk its as-sociated eigenvalue. Note that Dij =D2ðxi; yjÞ. In a matrix form,Eq. 15 reads Dij =

Pmek= 1 ðΦik −ΦjkÞ2Kkk, whereΦ represents the

eigendecomposition matrix of the LBO and K is a diagonal matrixsuch that Kkk =KðλkÞ.Denoting by Ψ, the matrix such that Ψij =Φ2

ij, it follows thatD=ΨK1me1

Tn + ðΨK1me1

Tn ÞT − 2ΦKΦT . Next, let us apply classi-

cal scaling to D, which by itself defines a flat domain. Be-cause 1Tn J= 0, then ΨK1me1

Tn J= 0, and we readily have JDJ=

− 2JΦKΦTJ= JΦð−2KÞΦTJ.Applying MDS to diffusion distances turns out to be nothing

but setting α= − 2K in the proposed spectral MDS framework.The spectral MDS presented in Eq. 14 and algorithm 1 can bedirectly applied to diffusion distances without explicit computationof these distances. Moreover, using data-trained optimal diffusionkernels, as those introduced in ref. 34, we could obtain a discrim-inatively enhanced flat domain, in which robust and efficientclassification between different classes is realized as part of theconstruction of the flat target space.

Numerical ExperimentsThe embedding of the intrinsic geometry of a shape into a Eu-clidean space is known as a canonical form. When the input to anMDS procedure is the set of geodesic distances between every twosurface points, the output would be such a form. These structures,introduced in ref. 6, are invariant to isometric deformations of theshape, as shown in SI Appendix.In our first example, we explore two shapes containing 5,000

vertices each for which we compute all pairwise geodesic dis-tances. Then, we select a subset of just 50 vertices, using thefarthest-point sampling strategy, and extract 50 eigenvectors of

the corresponding LBO. Finally, we flatten the surfaces into theircanonical forms using the spectral MDS procedure. A qualitativeevaluation is presented in Fig. 1 demonstrating the similarity ofthe results of classical scaling on the full matrix, and the spectralMDS operating on just 1% of the data.The relation of within classes and between classes for lionesses

and dogs from the TOSCA database (35) is shown in Fig. 2.Spectral MDS allows us to consider more points and can thus serveas a better clustering tool.Next, we compute the spectral interpolation of the matrix ~D

for sampled points from Armadillo’s surface. We use the samenumber of eigenvectors as that of sample points, and plot theaverage geodesic error as a function of the points we use. Themean relative error is computed by mean

� ~D−D �kD+ ek�, and

it is plotted as a function of the ratio between the number ofsample points and the total number of vertices of the shape (Fig.3, Left). Note that spectral interpolation of the geodesic dis-tances works as predicted. In this example, 5% of the distancesbetween the sample points provide us a distance map that ismore than 97% accurate.In the next experiment, we measured the influence of the

number of sampled points on the accuracy of the reconstruction.We first generated several triangulations of a sphere, each ata different resolution reflected by the number of triangles. Next, foreach such triangulation, we used spectral interpolation for calcu-lating the geodesic distances using just

ffiffiffin

ppoints, where n repre-

sents the number of vertices of the sphere at a given resolution.Fig. 3, Right shows the average geodesic error as a function of n.The average error is a decreasing function of the number of points,in the case where

ffiffiffin

ppoints are used for the construction.

Next, we report on an attempt to extract an R4 manifoldembedded in R10;000. We try to obtain the low-dimensionalhyperplane structure from a given set of 100,000 points in R10;000.The spectral MDS could handle 105 points with very little effort,whereas regular MDS had difficulties already with 104 points.Being able to process a large number of samples is importantwhen trying to deal with noisy data, as is often the case in big-data analysis. See MatLab code of this experiment in the SupportingInformation.Fig. 4 depicts the embedding of 5,851 images of handwritten

digit 8 into a plane. The data are taken from the Modified Na-tional Institute of Standards and Technology (MNIST) database(36). We used a naive metric by which the distance between twoimages is equal to integration over the pixel-wise differencesbetween them. Even in this trivial setting, the digits are arrangedin the plane such that those on the right tilt to the right, andthose on the left slant to the left. Fat digits appear at the bottom,and thin ones at the top.

10−2

10−1

10010

−3

10−2

10−1

Relative number of sample points

Ave

rage

rel

ativ

e ge

odes

ic e

rror

102

103

104

10510

−2

10−1

100

Number of vertices

Ave

rage

rel

ativ

e ge

odes

ic e

rror

A B

Fig. 3. (A) Relative geodesic distortion as a function of the number of samplepoints for interpolating D. (B) Geodesic error as a function of sample points.

Fig. 4. Flat embedding of handwritten digit 8 from the MNIST database.

18056 | www.pnas.org/cgi/doi/10.1073/pnas.1308708110 Aflalo and Kimmel

Dow

nloa

ded

by g

uest

on

Mar

ch 2

8, 2

021

Page 6: Spectral multidimensional scaling - PNAS · The computational complexity of multidimensional scaling was addressed by a multigrid approach in ref. 23, and vector extrapolation techniques

Fig. 5 presents two cases of flat embedding of ones vs. zeros,and fours vs. threes, from the MNIST dataset. Here, the pairwisedistance is the difference between the distance functions fromthe white regions in each image. While keeping the metric sim-ple, the separation between the classes of digits is obvious. Theelongated classes are the result of slanted digits that appear atdifferent orientations. This observation could help in furtherstudy toward automatic character recognition.Finally, we computed the regular MDS and the spectral MDS

on different numbers of samples of randomly chosen images,referred to as points. Two hundred eigenvectors of the LBOwere evaluated on an Intel i7 computer with 16 GB memory. Thefollowing table shows the time it took to compute MDS andspectral MDS embeddings.

ConclusionsBy working in the natural spectral domain of the data, we showedhow to reduce the complexity of dimensionality reduction pro-cedures. The proposed method exploits the local geometry of thedata manifold to construct a low-dimensional spectral Euclideanspace in which geodesic distance functions are compactly repre-sented. Then, reduction of the data into a low-dimensional Eu-clidean space can be executed with low computational and spacecomplexities. Working in the spectral domain allowed us to ac-count for large distances while reducing the size of the data weprocess in an intrinsically consistent manner. The leading func-tions of the LBO eigenbasis capture the semidifferential structureof the data and thus provide an optimal reconstruction domainfor the geodesic distances on the smooth data manifold. Theproposed framework allowed us to substantially reduce the spaceand time involved in manifold learning techniques, which areimportant in the analysis of big data.

ACKNOWLEDGMENTS. This research was supported by Office of Naval ResearchAward N00014-12-1-0517 and Israel Science Foundation Grant 1031/12.

1. Schwartz EL, Shaw A, Wolfson E (1989) A numerical solution to the generalizedmapmaker’s problem: Flattening nonconvex polyhedral surfaces. IEEE Trans PAMI11(9):1005–1008.

2. Drury HA, et al. (1996) Computerized mappings of the cerebral cortex: A multi-resolution flattening method and a surface-based coordinate system. J Cogn Neurosci8(1):1–28.

3. Dale AM, Fischl B, Sereno MI (1999) Cortical surface-based analysis. I. Segmentationand surface reconstruction. Neuroimage 9(2):179–194.

4. Zigelman G, Kimmel R, Kiryati N (2002) Texture mapping using surface flattening viamultidimensional scaling. IEEE Trans Vis Comp Graph 8(2):198–207.

5. Grossmann R, Kiryati N, Kimmel R (2002) Computational surface flattening: A voxel-based approach. IEEE Trans PAMI 24(4):433–441.

6. Elad A, Kimmel R (2003) On bending invariant signatures for surfaces. IEEE Trans PAMI25(10):1285–1295.

7. Schweitzer H (2001) Template matching approach to content based image indexingby low dimensional Euclidean embedding. Computer Vision 2001. ICCV 2001: Pro-ceedings of the Eighth International Conference on Computer Vision (IEEE, NewYork), pp 566–571.

8. Rubner Y, Tomasi C (2001) Perceptual metrics for image database navigation. PhDdissertation (Stanford Univ, Stanford, CA).

9. Aharon M, Kimmel R (2006) Representation analysis and synthesis of lip images usingdimensionality reduction. Int J Comput Vis 67(3):297–312.

10. Pless R (2003) Image spaces and video trajectories: Using Isomap to explore videosequences. Computer Vision 2003: Proceedings of the Ninth IEEE International Con-ference on Computer Vision (IEEE, New York), pp 1433–1440.

11. Bronstein AM, Bronstein MM, Kimmel R (2005) Three-dimensional face recognition.Int J Comput Vis 64(1):5–30.

12. Borg I, Groenen P (1997) Modern multidimensional scaling: Theory and applications.J Educ Meas 40(3):277–280.

13. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework fornonlinear dimensionality reduction. Science 290(5500):2319–2323.

14. Kimmel R, Sethian JA (1998) Computing geodesic paths on manifolds. Proc Natl AcadSci USA 95(15):8431–8435.

15. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear em-bedding. Science 290(5500):2323–2326.

16. Donoho DL, Grimes C (2003) Hessian eigenmaps: Locally linear embedding techniquesfor high-dimensional data. Proc Natl Acad Sci USA 100(10):5591–5596.

17. Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embed-ding and clustering. Advances in Neural Information Processing Systems (MIT Press,Cambridge, MA), pp 585–591.

18. Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30.

19. CoifmanRR, et al. (2005)Geometric diffusions as a tool for harmonic analysis and structuredefinition of data: Diffusion maps. Proc Natl Acad Sci USA 102(21):7426–7431.

20. de Silva V, Tenenbaum JB (2003) Global versus local methods in nonlinear di-mensionality reduction. Advances in Neural Information Processing Systems (MITPress, Cambridge, MA), pp 705–712.

21. Bengio Y, Paiement JF, Vincent P (2003) Out-of-sample extensions for LLE, Isomap,MDS, eigenmaps, and spectral clustering. Advances in Neural Information ProcessingSystems (MIT Press, Cambridge, MA), pp 177–184.

22. Asano T, et al. (2009) A linear-space algorithm for distance preserving graph em-bedding. Comput Geom Theory Appl 42(4):289–304.

23. Bronstein MM, Bronstein AM, Kimmel R, Yavneh I (2006) Multigrid multidimensionalscaling. Numer Linear Algebra App 13(2-3):149–171.

24. Rosman G, Bronstein MM, Bronstein AM, Sidi A, Kimmel R (2008) Fast multidimensionalscaling using vector extrapolation. Technical report CIS-2008-01 (Technion, Haifa, Israel).

25. Rosman G, Bronstein MM, Bronstein AM, Kimmel R (2010) Nonlinear dimensionality re-duction by topologically constrained isometric embedding. Int J Comput Vis 89(1):56–68.

26. Rustamov RM (2011) Interpolated eigenfunctions for volumetric shape processing. VisComput 27(11):951–961.

27. Raviv D, Bronstein MM, Bronstein AM, Kimmel R (2010) Volumetric heat kernel sig-natures. Proc ACM Workshop 3D Object Retrieval (ACM, New York), pp 39–44.

28. Williams CKI (2001) On a connection between kernel PCA and metric multidimen-sional scaling. Advances in Neural Information Processing Systems (MIT Press, Cam-bridge, MA), pp 675–681.

29. Bérard P, Besson G, Gallot S (1994) Embedding Riemannian manifolds by their heatkernel. Geom Funct Anal 4(4):373–398.

30. Sun J, Ovsjanikov M, Guibas L (2009) A concise and provably informative multi-scalesignature based on heat diffusion. Comp Graph Forum 28(5):1383–1392.

31. Ovsjanikov M, Ben-Chen M, Solomon J, Butscher A, Guibas L (2012) Functional maps:A flexible representation of maps between shapes. ACM Trans Graph 3:30:1–30:11.

32. Weyl H (1950) Ramifications, old and new, of the eigenvalue problem. Bull Am MathSoc 56(2):115–139.

33. Pinkall U, Polthier K (1993) Computing discrete minimal surfaces and their conjugates.Exper Math 2(1):15–36.

34. Aflalo Y, Bronstein AM, Bronstein MM, Kimmel R (2012) Deformable shape retrievalby learning diffusion kernels. Proc Scale Space Variational Meth Comp Vis (Springer,Berlin), Vol 6667, pp 689–700.

35. Bronstein AM, Bronstein MM, Kimmel R (2008) Numerical Geometry of Non-RigidShapes (Springer, New York).

36. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied todocument recognition. Proc IEEE 86(11):2278–2324.

No. of points MDS run time, s Spectral MDS run time, s

1,000 3 0.12,000 5.3 0.44,000 46 2.310,000 366 3.2

A B

Fig. 5. Flat embedding of zeros (blue) vs. ones (red) (A), and fours (red) vs.threes (blue) (B).

Aflalo and Kimmel PNAS | November 5, 2013 | vol. 110 | no. 45 | 18057

APP

LIED

MATH

EMATICS

SEECO

MMEN

TARY

Dow

nloa

ded

by g

uest

on

Mar

ch 2

8, 2

021