harmonic analysis and big data: data dependent representations · harmonic analysis and big data:...
TRANSCRIPT
Harmonic Analysis and Big Data: DataDependent Representations
Wojciech Czaja
Institute for Mathematics and Its Applications 2015 PI SummerGraduate Program
College Park, August 5–7, 2015
Wojciech Czaja Harmonic Analysis and Big Data
Introduction
Yesterday we have considered transformations which have an apriori structure which is independent from the given input data.These types of transformations have some advantages in speed,or in provable accuracy for certain classes of data.Drawbacks arise when data complexity increases over time andis no longer covered by our theoretical guarantees.A possible solution is to develop methods of data analysis whichare data-dependent.
Wojciech Czaja Harmonic Analysis and Big Data
Data Organization and Manifold Learning
There are many techniques for Data Organization and ManifoldLearning, e.g., Principal Component Analysis (PCA), LocallyLinear Embedding (LLE), Isomap, genetic algorithms, and neuralnetworks.We are interested in a subfamily of these techniques known asKernel Eigenmap Methods. These include Kernel PCA, LLE,Hessian LLE (HLLE), and Laplacian Eigenmaps, Diffusion Maps,Diffusion Wavelets (all sometimes known as Output NormalizedMethods).Kernel eigenmap methods require two steps. Given data spaceX of N vectors in RD.
1 Construction of an N × N symmetric, positive semi-definite kernel,K , from these N data points in RD .
2 Diagonalization of K , and then choosing d ≤ D significanteigenmaps of K . These become our new coordinates, andaccomplish dimensionality reduction.
We are particularly interested in diffusion kernels K , which aredefined by means of transition matrices.
Wojciech Czaja Harmonic Analysis and Big Data
Kernel Eigenmap Methods for Dimension Reduction -Kernel Construction
Kernel eigenmap methods were introduced to addresscomplexities not resolvable by linear methods.The idea behind kernel methods is to express correlations orsimilarities between vectors in the data space X in terms of asymmetric, positive semi-definite kernel function K : X × X → R.Generally, there exists a Hilbert space K and a mappingΦ : X → K such that
K (x , y) = 〈Φ(x),Φ(y)〉.
Then, diagonalize by the spectral theorem and choose significanteigenmaps to obtain dimensionality reduction.Kernels can be constructed by many kernel eigenmap methods.These include Kernel PCA, LLE, HLLE, and LaplacianEigenmaps.
Wojciech Czaja Harmonic Analysis and Big Data
Kernel Eigenmap Methods for Dimension Reduction -Kernel Diagonalization
The second step in kernel eigenmap methods is thediagonalization of the kernel.Let ej , j = 1, . . . ,N, be the set of eigenvectors of the kernelmatrix K , with eigenvalues λj .Order the eigenvalues monotonically.Choose the top d << D significant eigenvectors to map theoriginal data points xi ∈ RD to (e1(i), . . . ,ed (i)) ∈ Rd ,i = 1, . . . ,N.
Wojciech Czaja Harmonic Analysis and Big Data
Data Organization
X
1 ''
++ Y
K2
GG
There are other alternative interpretations for the steps of ourdiagram:
1 Constructions of kernels K may be independent from data andbased on principles.
2 Redundant representations, such as frames, can be used toreplace orthonormal eigendecompositions.
We need not select the target dimensionality to be lower than thedimension of the input. This leads, to data expansion, or dataorganization, rather then dimensionality reduction.
Wojciech Czaja Harmonic Analysis and Big Data
Operator Theory on Graphs
Presented approach leads to analysis of operators ondata-dependent structures, such as graphs or manifolds.Locally Linear Embedding, Diffusion Maps, Diffusion Wavelets,Laplacian Eigenmaps, Schroedinger Eigenmaps.Mathematical core:
Pick a positive symmetric operator A as the infinitesimal generatorof a semigroup of operators, etA, t > 0.The semigroup can be identified with the Markov processes ofdiffusion or random walks, as is the case, e.g., with Diffusion Mapsand Diffusion Wavelets of R. R. Coifman, S. Lafon, and M.Maggioni.The infinitesimal generator and the semigroup share the commonrepresentation, e.g., eigenbasis.
A. Pazy, Semi-groups of Linear Operators And Applications to Partial Differential Equations, Lecture Notes #10, University of MarylandCollege Park, January 25, 1974
A. Pazy, Semi-groups of Linear Operators And Applications to Partial Differential Equations, in Applied Mathematical Sciences (1983),Springer
S. Lafon, Diffusion maps and geometric harmonics, Ph.D. Thesis, Yale University, 2004
Wojciech Czaja Harmonic Analysis and Big Data
PCA - Principal Component Analysis
Principal Component Analysis (PCA), also known asKarhunen–Loeve transform (KLT) or the Hotelling transform.
K. Pearson, On lines and planes of closest fit to systems of pointsin space, Philosophical Magazine, vol. 2 (1901), pp. 559–572H. Hoteling, Analysis of a complex of statistical variables intoprincipal components, Journal of Education Psychology, vol. 24(1933), pp. 417–44K. Karhunen, Zur Spektraltheorie stochastischer Prozesse, Ann.Acad. Sci. Fennicae, vol. 34 (1946)M. Loeve, Fonctions aleatoire du second ordre, in Processusstochastiques et mouvement Brownien, p. 299, Paris (1948)
Wojciech Czaja Harmonic Analysis and Big Data
PCA from Data perspective
We present a data-inspired model for PCA.Assume we have D observed (measured) variables:y = [y1, . . . , yD]T . This is our data.Assume we know that our data is obtained by a lineartransformation W from d unknown variables x = [x1, . . . , xd ]T :
y = W (x).
Typically we assume d < D.Assume moreover that the D × d matrix W is a change of acoordinate system, i.e., columns of W (or rows of W T ) areorthonormal to each other:
W T W = Idd .
Note that WW T need not be an identity.
Wojciech Czaja Harmonic Analysis and Big Data
PCA
Given the above assumptions the problem of PCA can be stated asfollows:
How can we find the transformation Wand the dimension d from a finite number of measurements y?
We shall need 2 additional assumptions:Assume that the unknown variables are Gaussian;Assume that both the unknown variables and the observationshave mean zero (this is easily guaranteed by subtracting themean, or the sample mean).
Wojciech Czaja Harmonic Analysis and Big Data
PCA from minimizing the reconstruction error
For a noninvertible matrix, we have its pseudoinverse defined as
W+ = (W T W )−1W T
In our case, W+ = W T , Thus, if y = Wx , we have
WW T y = WW T Wx = WIddx = y ,
or, equivalently,y −WW T y = 0.
With the presence of noise, we cannot assume anymore the perfectreconstruction, hence, we shall minimize the reconstruction errordefined as
Ey (‖y −WW T y‖22).
It is not difficult to see that
Ey (‖y −WW T y‖22) = Ey (yT y)− Ey (yT WW T y).
Wojciech Czaja Harmonic Analysis and Big Data
PCA from minimizing the reconstruction error
As Ey (yT y) is constant, our minimization of error reconstruction turnsinto a maximization of Ey (yT WW T y). In reality, we know little abouty , so we have to rely on the measurements y(k), k = 1, . . . ,N. Then,
Ey (yT WW T y) ∼ 1N
N∑n=1
(y(n))T WW T (y(n)) ∼ 1N
tr(Y T WW T Y ),
where Y is the matrix whose columns are the measurements y(n)(hence Y is a D × N matrix).Using SVD for Y : Y = V ΣUT , we obtain:
Ey (yT WW T y) ∼ 1N
tr(UΣT V T WW T V ΣUT ).
Therefore, after some computations we obtain:
argmaxW Ey (yT WW T y) ∼ V IdD×d ,
and so x ∼ Idd×DV T y .
Wojciech Czaja Harmonic Analysis and Big Data
New Example: Laplacian Eigenmaps
M. Belkin and P. Niyogi, 2001.Points close on the manifold should remain close in Rd .Use Laplace-Beltrami operator ∆M to control the embedding.Use discrete approximations for practical problems.Proven convergence (Belkin and Niyogi, 2003 – 2008).Generalized by Diffusion Maps and Diffusion Wavelets.
M. Belkin, P. Niyogi, Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, NIPS 2001.
M. Belkin, P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation, vol.15 no. 6 (2003),pp. 1373–1396.
M. Belkin, Problems of Learning on Manifolds, Ph.D. Thesis, University of Chicago, 2003.
Wojciech Czaja Harmonic Analysis and Big Data
Laplacian Eigenmaps - Implementation
1 Put an edge between nodes i and j if xi and xj are close.Precisely, given a parameter k ∈ N, put an edge between nodes iand j if xi is among the k nearest neighbors of xj or vice versa.
2 Given a parameter t > 0, if nodes i and j are connected, set
[Wt,n]i,j = e−‖xi−xj‖
2
4t .3 Set [Dt,n]i,i =
∑j [Wt,n]i,j , and let Lt,n = Dt,n −Wt,n. Solve
Lt,nf = λDt,nf , under the constraint y>Dt,ny = Id . Let f0, f1, . . . , fdbe d + 1 eigenvector solutions corresponding to the firsteigenvalues 0 = λ0 ≤ λ1 ≤ · · · ≤ λd . Discard f0 and use the nextd eigenvectors to embed in d-dimensional Euclidean spaceusing the map xi → (f1(i), f2(i), . . . , fd (i)).
Wojciech Czaja Harmonic Analysis and Big Data
Swiss Roll
Figure : a) Original, b) PCA in R2
A. Halevy, Extensions of Laplacian Eigenmaps for Manifold Learning, PhD. Thesis, University of Maryland College Park, 2011
Wojciech Czaja Harmonic Analysis and Big Data
Swiss Roll
Figure : a) Original, b) LE outputs in R2 for different choices of parameters
Image source: M. Belkin, P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation,
vol.15 no. 6 (2003), pp. 1373–1396.
Wojciech Czaja Harmonic Analysis and Big Data
Convergence of Laplacians
Given n data points x1, . . . , xn from a manifold MWe constructed an operator Lt,n which acts on a function fdefined on the data points as follows:
Lt,n(f )(xi ) =∑
j
f (xi )e−‖xi−xj‖
2
4t −∑
j
f (xj )e−‖xi−xj‖
2
4t
Our goal is to show that such defined operator is related in astrong sense to the Laplace-Beltrami operator on the manifold M.
Wojciech Czaja Harmonic Analysis and Big Data
Laplace operator and heat equation
Heat equation ∂∂t u(x , t) + ∆u(x , t) = 0, with u(x ,0) = f (x).
Solution given by u(x , t) = Ht (f )(x), where
Ht (f )(x) =
∫M
ht (x , y)f (y)dµ(y).
On Rk , we have ht (x , y) = (4πt)−k/2e−‖x−y‖2
4t .Key to approximation is the following equality
∆(f )(x) = − ∂
∂tu(x , t)
∣∣∣∣t=0
= − ∂
∂tHt (f )(x)
∣∣∣∣t=0
= limt→0
1t
(f (x)−Ht (f )(x)).
Wojciech Czaja Harmonic Analysis and Big Data
Key to approximation
∆(f )(x) = limt→0
1t
(4πt)−k/2(∫
Mf (x)e−
‖x−y‖2
4t dy −∫
Mf (y)e−
‖x−y‖2
4t dy)
∆(f )(x) = limt→0
1t
(4πt)−k/2
1n
∑j
f (xi )e−‖xi−xj‖
2
4t − 1n
∑j
f (xj )e−‖xi−xj‖
2
4t
=
1t
(4πt)−k/2 1n
Lt,n(f )(x).
Wojciech Czaja Harmonic Analysis and Big Data
From discrete to continuous
Define Lt,n : C(M)→ C(M):
Lt,n(f )(x) =1t
(4πt)−k/2
1n
∑j
f (xi )e−‖xi−xj‖
2
4t − 1n
∑j
f (xj )e−‖xi−xj‖
2
4t
Define Lt : L2(M)→ L2(M):
Lt (f )(x) =1t
(4πt)−k/2(∫
Mf (x)e−
‖x−y‖2
4t dy −∫
Mf (y)e−
‖x−y‖2
4t dy)
Wojciech Czaja Harmonic Analysis and Big Data
A few more notations
Laplace-Beltrami operator on the manifold M
∆M : C2(M)→ L2(M), ∆M(f ) = −div(∇f )
Heat kernel ht (x , y) : M ×M → RHeat operator Ht : L2(M)→ L2(M)
Ht (f )(x) =
∫M
ht (x , y)f (y)dy
Alternatively Ht (f ) = exp(−t∆M)(f )
Remainder Rt = 1t (Id − Ht )− Lt .
Wojciech Czaja Harmonic Analysis and Big Data
Pointwise convergence
Theorem (Pointwise Convergence - Belkin and Niyogi, 2007)
Let the data points x1, x2, . . . , xn be sampled independently from auniform distribution on a smooth compact manifold M ⊂ Rd . Letα > 0,and let tn = n−1/(k+2+α). For f ∈ C∞(M) we then have that:
limn→∞
Ltn,n(f )(x) =1
vol(M)∆M(f )(x)
in probability, where vol(M) is the volume of the manifold with respectto the canonical measure.M. Belkin, P. Niyogi, Towards a Theoretical Foundation for Laplacian-Based Manifold Methods, COLT 2005.M. Belkin, P. Niyogi, Towards a Theoretical Foundation for Laplacian-Based Manifold Methods, Journal of Computer and System Sciences,Vol. 74 no. 8 (2008), pp. 1289–308.
Wojciech Czaja Harmonic Analysis and Big Data
Overview of the proof
Usually we do not know the exact form of the heat kernel on themanifoldUsually we do not know the geodesic distance between points,only the distance in the ambient spaceHenceforth, our strategy is as follows:
Lt,n(f )(p) →n→∞ Lt (f )(p) →t→01
vol(M)∆M(f )(p).
Wojciech Czaja Harmonic Analysis and Big Data
Sample approximations
Fix f ∈ C∞(M), p ∈ M, t > 0.
Lt,n is nothing else but the sample average of n i.i.d. randomvariables with expectation E [Lt,n(f )(p)] = Lt (f )(p).
Lemma (Hoeffding’s Inequality)
Let X1,X2, . . . ,Xn be i.i.d. random variables such that |Xi | ≤ K . Then
P{∣∣∣∣∑i Xi
n− E [Xi ]
∣∣∣∣ > ε
}≤ 2 exp(−ε2n/2K 2)
Thus P[|Lt,n(f )(p)− Lt (f )(p)| > ε
]≤ 2 exp(−ε2nt2(4πt)k/2K 2)
Note that Hoeffding’s inequality yields a stronger result than,e.g., Chebyshev’s inequality.
W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association vol. 58 no. 301
(1963), pp. 13–30.
Wojciech Czaja Harmonic Analysis and Big Data
Proof of pointwise convergence
Let t = tn = n−1/(k+2+α), α > 0. Then for any ε > 0,
limn→∞
P[|Lt,n(f )(p)− Lt (f )(p)| > ε
]= 0.
Therefore, if we can show that the following holds:
Lt (f )(p) →t→01
vol(M)∆M(f )(p),
then we can finish the proof of the theorem.
Wojciech Czaja Harmonic Analysis and Big Data
Functional approximation
We can show the last approximation in 3 major steps:Reduce the integral to a ball B in M, i.e., show that
limt→0
Lt (f )(p) = limt→0
1t
(4πt)−k/2∫
B(f (p)− f (y))e−
‖p−y‖2
4y dµ(y).
Apply a change of coordinates using the exponential map torewrite the integral in Rk .Use a Taylor expansion to analyze the integral in Rk .
Wojciech Czaja Harmonic Analysis and Big Data
Step 1: Reduction to a ball
Let d = infx /∈B ‖p − x‖2 and m = µ(M \ B).Since B is open, we have that d > 0.Moreover,∣∣∣∣∫
Bf (y)e−
‖p−y‖2
4t dµ(y)−∫
Mf (y)e−
‖p−y‖2
4t dµ(y)
∣∣∣∣ ≤ m supx∈M|f (x)|e−d2/4t
Thus, the difference can be estimates as o(ta) for any a > 0.
Wojciech Czaja Harmonic Analysis and Big Data
Step 2: Change of coordinates
expp : TpM(∼ Rk )→ M, expp(0) = p.expp sends straight lines through the origin to geodesics
We can choose a small ball B ⊂ Rk such that expp is adiffeomorphism onto its image B ⊂ M.Rewrite f : M → Rk as f (x) = f (expp(x)).We thus obtain the following:
∆M(f )(p) = ∆(f )(0) = −∑
i
∂2 f∂x2
i(0).
Lemma (Beylkin, Niyogi, 2007)
There exists a constant C > 0 such that for any pair p, y ∈ M, wherey = expp(x), we have
0 ≤ ‖x‖2k − ‖y − p‖2
d = g(x) ≤ C‖x‖4k .
Wojciech Czaja Harmonic Analysis and Big Data
Step 2: Change of coordinates
1vol(M)
1t
(4πt)−k/2∫
B(f (0)− f (x))e−
‖p−expp (x)‖2
4t
√det(gi,j )dx
For the metric tensor g we have:
det(gi,j ) = 1 + O(‖x‖2)
1vol(M)
1t
(4πt)−k/2∫
B(f (0)− f (x))e−
‖x‖2−g(x)4t (1 + O(‖x‖2))dx
Wojciech Czaja Harmonic Analysis and Big Data
Analysis in Rk
The key component of the previous approximations is thefollowing term:
At =1
vol(M)
1t
(4πt)−k/2∫
B(f (0)− f (x))e−‖x‖
2/4tdx .
f (x)− f (0) = x∇f + 12 xT Hx + O(‖x‖3), where H denotes the
Hessian.Thus,At = − 1
vol(M)1t (4πt)−k/2
∫B(x∇f + 1
2 xT Hx + O(‖x‖3))e−‖x‖2/4tdx .
Note that∫
B xie−‖x‖2/4tdx = 0 and
∫B xixje−‖x‖
2/4tdx = 0 fori 6= j .Hence
vol(M) limt→0
At = −tr(H) = ∆M(f )(p),
which completes the proof of the point wise convergence.
Wojciech Czaja Harmonic Analysis and Big Data
Spectral convergence
We can also prove stronger results about spectral convergenceof Laplacian Eigenmaps.Eig(Lt,n) →n→∞ Eig(Lt ) →t→0 Eig(∆M)
Theorem (Spectral Convergence - von Luxburg, Bousquet, Belkin,2004)
For a fixed, sufficiently small t , let λit,n and λi
t denote the itheigenvalues of Lt,n and Lt , resp. Similarly, let ei
t,n and eit denote the
corresponding eigenvectors. If λit <
12t , then
limn→∞
λit,n = λi
t and limn→∞
‖eit,n − ei
t‖ = 0.
U. von Luxburg, O. Bousquet, M. Belkin, Limits of Spectral Clustering, NIPS 2004.
Wojciech Czaja Harmonic Analysis and Big Data
Spectral convergence
Theorem (Spectral Convergence - Belkin, Niyogi, 2008)
Let λi and λit denote the ith eigenvalues of ∆M and Lt , resp. Similarly,
let ei and eit denote their corresponding eigenvectors. Then
limt→0|λi − λi
t | = 0, limt→0‖ei − ei
t‖ = 0.
M. Belkin, P. Niyogi, Convergence of Laplacian Eigenmaps, NIPS 2008
Wojciech Czaja Harmonic Analysis and Big Data
Sketch of proof of spectral convergence
Denote the ith eigenvalue of Id−Htt by λi ((Id − Ht )/t)
Because Ht (f ) = exp(−t∆M)(f ), we have
λi ((Id − Ht )/t) =1− e−λ
i t
tand lim
t→0λi ((Id − Ht )/t) == λi
Eigenvectors are also equal: ei = ei ((Id − Ht )/t)We now need to show that for each fixed i and small t, the itheigenpair of Lt is close to that of Id−Ht
t
Wojciech Czaja Harmonic Analysis and Big Data
Sketch of proof of spectral convergence
Lemma (Belkin, Niyogi, 2008)
Let A,B be positive, self-adjoint operators with discrete spectrumarranged in increasing order. Let R = A− B and assume that for allf ∈ L2(X ), we have ∣∣∣∣ 〈R(f ), f 〉
〈A(f ), f 〉
∣∣∣∣ ≤ ε.Then, for all k, 1− ε ≤ λk (B)/λk (A) ≤ 1 + ε.
|〈A(f ), f 〉| ≤ |〈B(f ), f 〉|+ |〈R(f ), f 〉| ≤ |〈B(f ), f 〉|+ ε|〈A(f ), f 〉||〈A(f ), f 〉| ≥ |〈B(f ), f 〉| − |〈R(f ), f 〉| ≥ |〈B(f ), f 〉| − ε|〈A(f ), f 〉|(1− ε)|〈A(f ), f 〉| ≤ |〈B(f ), f 〉| ≤ (1 + ε)|〈A(f ), f 〉|If H is a k -dimensional subspace of L2(X ), then
(1− ε) maxH
minf∈H⊥
≤ maxH
minf∈H⊥
|〈B(f ), f 〉|
The result follows now from Courant-Fisher (min-max) theorem.
Wojciech Czaja Harmonic Analysis and Big Data
Sketch of proof of spectral convergence
Combining the previous lemma with the following theorem, completesthe proof of spectral convergence.
Theorem (Belkin, Niyogi, 2008)
For small enough t, there exists a constant C > 0, independent of t,such that
supf∈L2(M)
∣∣∣∣∣ 〈R(f ), f 〉|〈 Id−Ht
t (f ), f 〉
∣∣∣∣∣ ≤ Ct2
k+6 .
Wojciech Czaja Harmonic Analysis and Big Data
From Laplacian to Schroedinger Eigenmaps
For the following considerations we fix t and n, hence we can discardfor the moment the subscript t ,n, without loss of generality.Consider the following minimization problem, y ∈ Rd ,
miny>Dy=Id
12
∑i,j
‖yi − yj‖2Wi,j = miny>Dy=E
tr(y>Ly).
Its solution is given by the d minimal non-zero eigenvalue solutions ofLf = λDf under the constraint y>Dy = Id .
Similarly, for diagonal α · V , α > 0, consider the problem
miny>Dy=Id
12
∑i,j
‖yi−yj‖2Wi,j +α∑
i
‖yi‖2Vi,i = miny>Dy=E
tr(y>(L+α ·V )y),
(1)which leads to solving equation (L + αV )f = λDf .
Wojciech Czaja Harmonic Analysis and Big Data
Schroedinger Eigenmaps
With M. Ehler, 2011Want to go from un-supervised to semi-supervised learningIn SE, replace L by L + V , where V is, e.g., a barrier potentialSchroedinger Eigenmaps allow for the use of labeled dataEnforce certain relations between the nodesAllow to utilize expert input or templates in otherwise fullyautomated techniques such as LEHave proven convergence results (with A. Halevy)
W. Czaja, M. Ehler, Schroedinger type eigenmaps for the analysis and classification of multispectral data in biomedical imaging, IEEETPAMI, vol. 35(5) (2013), pp. 1274–1280
A. Halevy, Extensions of Laplacian Eigenmaps for Manifold Learning, PhD. Thesis, University of Maryland College Park, 2011
Wojciech Czaja Harmonic Analysis and Big Data
Properties of Schroedinger Eigenmaps
Let the data graph be connected and let V be a symmetric positivesemi-definite matrix.
Theorem (with M. Ehler)
Let the data graph be connected, let V be a symmetric positivesemi-definite, and let n ≤ dim(Null(V )). Then the minimizer of (1)satisfies:
‖y (α)‖2V = trace2(y (α)T
Vy (α)) ≤ C1α.
In particular, if V = diag(v1, . . . , vN), then
vi‖y (α)i ‖
2 ≤N∑
i=1
vi‖y (α)i ‖
2 ≤ C11α, for all i = 1, . . . ,N.
Wojciech Czaja Harmonic Analysis and Big Data
Pointwise Convergence of SE
Given n data points x1, x2, . . . , xn sampled independently from auniform distribution on a smooth, compact, d-dimensional manifoldM⊂ RD, define the operator Lt,n : C(M)→ C(M) by
Lt,n(f )(x) =1
(4πt)d/2t
1n
∑j
f (x)e−‖x−xj‖
2
4t − 1n
∑j
f (xj )e−‖x−xj‖
2
4t
.
Let v ∈ C(M) be a potential. For x ∈M, let yn(x) = arg minx1,x2,...,xn
‖x − xi‖
and define Vn : C(M)→ C(M) by Vnf (x) = v(yn(x))f (x).
Theorem (Pointwise Convergence, with A. Halevy)
Let α > 0, and set tn = ( 1n )
1d+2+α . For f ∈ C∞(M),
limn→∞
Ltn,nf (x) + Vnf (x) = C∆Mf (x) + v(x)f (x) in probability.
Wojciech Czaja Harmonic Analysis and Big Data
Spectral Convergence of SE - Theorem
Let Lt,n be the unnormalized discrete Laplacian.
Theorem (Spectral Convergence of SE, with A. Halevy)
Let λit,n and ei
t,n be the ith eigenvalue and correspondingeigenfunction of Lt,n + Vn. Let λi and ei be the ith eigenvalue andcorresponding eigenfunction of ∆M + V. Then there exists asequence tn → 0 such that, in probability,
limn→∞
λitn,n = λi and lim
n→∞‖ei
tn,n − ei‖ = 0.
Wojciech Czaja Harmonic Analysis and Big Data
SE as Semisupervised Method
−8 −6 −4 −2 0 2 4 6 8x 10−3
−8
−6
−4
−2
0
2
4
6
8x 10−3
−8 −6 −4 −2 0 2 4 6 8x 10−3
−8
−6
−4
−2
0
2
4
6
8x 10−3
−8 −6 −4 −2 0 2 4 6 8x 10−3
−8
−6
−4
−2
0
2
4
6
8x 10−3
−8 −6 −4 −2 0 2 4 6 8x 10−3
−8
−6
−4
−2
0
2
4
6
8x 10−3
The Schroedinger Eigenmaps with diagonal potentialV = diag(0, . . . ,0,1,0, . . . ,0) only acting in one point yi0 in the middleof the arc for α = 0.05,0.1,0.5,5. This point is pushed to zero.
Wojciech Czaja Harmonic Analysis and Big Data
SE as Semisupervised Method
−8 −6 −4 −2 0 2 4 6 8x 10−3
−8
−6
−4
−2
0
2
4
6
8x 10−3
−8 −6 −4 −2 0 2 4 6 8x 10−3
−8
−6
−4
−2
0
2
4
6
8x 10−3
−8 −6 −4 −2 0 2 4 6 8x 10−3
−8
−6
−4
−2
0
2
4
6
8x 10−3
−8 −6 −4 −2 0 2 4 6 8x 10−3
−8
−6
−4
−2
0
2
4
6
8x 10−3
By applying the potential to the end points of the arc forα = 0.01,0.05,0.1,1, we are able to control the dimension reductionsuch that we obtain an almost perfect circle.
Wojciech Czaja Harmonic Analysis and Big Data
Hyperspectral data
20 40 60 80 100 120 140
20
40
60
80
100
120
140
50 100 150
100
200
300
400
500
600
(left) The Indian Pines image is a 145× 145 pixel image with 224spectral bands. It was acquired using an AVIRIS spectrometer. (right)The Pavia University image is a 610× 340 pixel image that contains115 spectral bands. It was acquired using a ROSIS sensor.Data courtesy of:Prof. Landgrebe (Purdue University, USA): the Indian Pines dataset,
Prof. Paolo Gamba (Pavia University, Italy): the Pavia University dataset.
Wojciech Czaja Harmonic Analysis and Big Data
Cluster Potentials
Let ι = {i1, . . . , im} be a collection of nodes. A cluster potential over ιis defined by:
V [ι, ι] :=
1 −1−1 2 −1
. . . . . . . . .−1 2 −1
−1 1
.
For such a V , the minimization problem is equivalent to:
minyT Dy=I
12
∑i,j
‖yi − yj‖2Wi,j + α
m−1∑k=1
‖yik − yik+1‖2.
Wojciech Czaja Harmonic Analysis and Big Data
Schroedinger Eigenmaps with Cluster Potentials
Use k-means clustering to cluster the hyperspectral image. Theclustering was initialized randomly so that the construction of thepotential would be completely unsupervised. The Euclideanmetric was used.Define a cluster potential, Vk over each of the k = 1, . . . ,Kclusters. The order of the pixels in each cluster is determined bytheir index (smallest to largest).Aggregate into one potential by summing the individual clusterpotentials:
VK =K∑
k=1
Vk .
Wojciech Czaja Harmonic Analysis and Big Data
Impact of SE on Cluster analysis
Pavia University: Dimensions 4 and 5 of the LE and SE embeddingsfor classes 2 (meadows), 3 (gravel), and 7 (bitumen)
J. J. Benedetto, W. Czaja, J. Dobrosotskaya, T. Doster, K. Duke, D. Gillis, Semi-supervised learning of heterogeneous data in remote sensing imagery,in Independent Component Analyses, Compressive Sampling, Wavelets, Neural Net, Biosystems, and Nanoengineering X,Proc. SPIE Vol. 8401 (2012), no. 8401-03
Wojciech Czaja Harmonic Analysis and Big Data
Impact of SE on Cluster analysis
Indian Pines: Dimensions 17 and 22 of the LE and SE embeddingsfor classes 2 (corn 1), 3 (corn 2), and 10 (soybeen 1)
Wojciech Czaja Harmonic Analysis and Big Data
Summary and Conclusions
We have introduced several examples of data dependentrepresentation methods.These methods are well suited for the analysis of complex, noisy,high-dimensional data.As a drawback, we are going to encounter increasedcomputational requirements relative to fast a priori methods,such as FFT or DWT.Also, the nonlinear dimensionality reduction methods aretypically non-invertible.
Wojciech Czaja Harmonic Analysis and Big Data