harmonic analysis and big data: data dependent representations · harmonic analysis and big data:...

48
Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications 2015 PI Summer Graduate Program College Park, August 5–7, 2015 Wojciech Czaja Harmonic Analysis and Big Data

Upload: others

Post on 29-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Harmonic Analysis and Big Data: DataDependent Representations

Wojciech Czaja

Institute for Mathematics and Its Applications 2015 PI SummerGraduate Program

College Park, August 5–7, 2015

Wojciech Czaja Harmonic Analysis and Big Data

Page 2: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Introduction

Yesterday we have considered transformations which have an apriori structure which is independent from the given input data.These types of transformations have some advantages in speed,or in provable accuracy for certain classes of data.Drawbacks arise when data complexity increases over time andis no longer covered by our theoretical guarantees.A possible solution is to develop methods of data analysis whichare data-dependent.

Wojciech Czaja Harmonic Analysis and Big Data

Page 3: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Data Organization and Manifold Learning

There are many techniques for Data Organization and ManifoldLearning, e.g., Principal Component Analysis (PCA), LocallyLinear Embedding (LLE), Isomap, genetic algorithms, and neuralnetworks.We are interested in a subfamily of these techniques known asKernel Eigenmap Methods. These include Kernel PCA, LLE,Hessian LLE (HLLE), and Laplacian Eigenmaps, Diffusion Maps,Diffusion Wavelets (all sometimes known as Output NormalizedMethods).Kernel eigenmap methods require two steps. Given data spaceX of N vectors in RD.

1 Construction of an N × N symmetric, positive semi-definite kernel,K , from these N data points in RD .

2 Diagonalization of K , and then choosing d ≤ D significanteigenmaps of K . These become our new coordinates, andaccomplish dimensionality reduction.

We are particularly interested in diffusion kernels K , which aredefined by means of transition matrices.

Wojciech Czaja Harmonic Analysis and Big Data

Page 4: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Kernel Eigenmap Methods for Dimension Reduction -Kernel Construction

Kernel eigenmap methods were introduced to addresscomplexities not resolvable by linear methods.The idea behind kernel methods is to express correlations orsimilarities between vectors in the data space X in terms of asymmetric, positive semi-definite kernel function K : X × X → R.Generally, there exists a Hilbert space K and a mappingΦ : X → K such that

K (x , y) = 〈Φ(x),Φ(y)〉.

Then, diagonalize by the spectral theorem and choose significanteigenmaps to obtain dimensionality reduction.Kernels can be constructed by many kernel eigenmap methods.These include Kernel PCA, LLE, HLLE, and LaplacianEigenmaps.

Wojciech Czaja Harmonic Analysis and Big Data

Page 5: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Kernel Eigenmap Methods for Dimension Reduction -Kernel Diagonalization

The second step in kernel eigenmap methods is thediagonalization of the kernel.Let ej , j = 1, . . . ,N, be the set of eigenvectors of the kernelmatrix K , with eigenvalues λj .Order the eigenvalues monotonically.Choose the top d << D significant eigenvectors to map theoriginal data points xi ∈ RD to (e1(i), . . . ,ed (i)) ∈ Rd ,i = 1, . . . ,N.

Wojciech Czaja Harmonic Analysis and Big Data

Page 6: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Data Organization

X

1 ''

++ Y

K2

GG

There are other alternative interpretations for the steps of ourdiagram:

1 Constructions of kernels K may be independent from data andbased on principles.

2 Redundant representations, such as frames, can be used toreplace orthonormal eigendecompositions.

We need not select the target dimensionality to be lower than thedimension of the input. This leads, to data expansion, or dataorganization, rather then dimensionality reduction.

Wojciech Czaja Harmonic Analysis and Big Data

Page 7: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Operator Theory on Graphs

Presented approach leads to analysis of operators ondata-dependent structures, such as graphs or manifolds.Locally Linear Embedding, Diffusion Maps, Diffusion Wavelets,Laplacian Eigenmaps, Schroedinger Eigenmaps.Mathematical core:

Pick a positive symmetric operator A as the infinitesimal generatorof a semigroup of operators, etA, t > 0.The semigroup can be identified with the Markov processes ofdiffusion or random walks, as is the case, e.g., with Diffusion Mapsand Diffusion Wavelets of R. R. Coifman, S. Lafon, and M.Maggioni.The infinitesimal generator and the semigroup share the commonrepresentation, e.g., eigenbasis.

A. Pazy, Semi-groups of Linear Operators And Applications to Partial Differential Equations, Lecture Notes #10, University of MarylandCollege Park, January 25, 1974

A. Pazy, Semi-groups of Linear Operators And Applications to Partial Differential Equations, in Applied Mathematical Sciences (1983),Springer

S. Lafon, Diffusion maps and geometric harmonics, Ph.D. Thesis, Yale University, 2004

Wojciech Czaja Harmonic Analysis and Big Data

Page 8: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

PCA - Principal Component Analysis

Principal Component Analysis (PCA), also known asKarhunen–Loeve transform (KLT) or the Hotelling transform.

K. Pearson, On lines and planes of closest fit to systems of pointsin space, Philosophical Magazine, vol. 2 (1901), pp. 559–572H. Hoteling, Analysis of a complex of statistical variables intoprincipal components, Journal of Education Psychology, vol. 24(1933), pp. 417–44K. Karhunen, Zur Spektraltheorie stochastischer Prozesse, Ann.Acad. Sci. Fennicae, vol. 34 (1946)M. Loeve, Fonctions aleatoire du second ordre, in Processusstochastiques et mouvement Brownien, p. 299, Paris (1948)

Wojciech Czaja Harmonic Analysis and Big Data

Page 9: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

PCA from Data perspective

We present a data-inspired model for PCA.Assume we have D observed (measured) variables:y = [y1, . . . , yD]T . This is our data.Assume we know that our data is obtained by a lineartransformation W from d unknown variables x = [x1, . . . , xd ]T :

y = W (x).

Typically we assume d < D.Assume moreover that the D × d matrix W is a change of acoordinate system, i.e., columns of W (or rows of W T ) areorthonormal to each other:

W T W = Idd .

Note that WW T need not be an identity.

Wojciech Czaja Harmonic Analysis and Big Data

Page 10: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

PCA

Given the above assumptions the problem of PCA can be stated asfollows:

How can we find the transformation Wand the dimension d from a finite number of measurements y?

We shall need 2 additional assumptions:Assume that the unknown variables are Gaussian;Assume that both the unknown variables and the observationshave mean zero (this is easily guaranteed by subtracting themean, or the sample mean).

Wojciech Czaja Harmonic Analysis and Big Data

Page 11: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

PCA from minimizing the reconstruction error

For a noninvertible matrix, we have its pseudoinverse defined as

W+ = (W T W )−1W T

In our case, W+ = W T , Thus, if y = Wx , we have

WW T y = WW T Wx = WIddx = y ,

or, equivalently,y −WW T y = 0.

With the presence of noise, we cannot assume anymore the perfectreconstruction, hence, we shall minimize the reconstruction errordefined as

Ey (‖y −WW T y‖22).

It is not difficult to see that

Ey (‖y −WW T y‖22) = Ey (yT y)− Ey (yT WW T y).

Wojciech Czaja Harmonic Analysis and Big Data

Page 12: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

PCA from minimizing the reconstruction error

As Ey (yT y) is constant, our minimization of error reconstruction turnsinto a maximization of Ey (yT WW T y). In reality, we know little abouty , so we have to rely on the measurements y(k), k = 1, . . . ,N. Then,

Ey (yT WW T y) ∼ 1N

N∑n=1

(y(n))T WW T (y(n)) ∼ 1N

tr(Y T WW T Y ),

where Y is the matrix whose columns are the measurements y(n)(hence Y is a D × N matrix).Using SVD for Y : Y = V ΣUT , we obtain:

Ey (yT WW T y) ∼ 1N

tr(UΣT V T WW T V ΣUT ).

Therefore, after some computations we obtain:

argmaxW Ey (yT WW T y) ∼ V IdD×d ,

and so x ∼ Idd×DV T y .

Wojciech Czaja Harmonic Analysis and Big Data

Page 13: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

New Example: Laplacian Eigenmaps

M. Belkin and P. Niyogi, 2001.Points close on the manifold should remain close in Rd .Use Laplace-Beltrami operator ∆M to control the embedding.Use discrete approximations for practical problems.Proven convergence (Belkin and Niyogi, 2003 – 2008).Generalized by Diffusion Maps and Diffusion Wavelets.

M. Belkin, P. Niyogi, Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, NIPS 2001.

M. Belkin, P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation, vol.15 no. 6 (2003),pp. 1373–1396.

M. Belkin, Problems of Learning on Manifolds, Ph.D. Thesis, University of Chicago, 2003.

Wojciech Czaja Harmonic Analysis and Big Data

Page 14: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Laplacian Eigenmaps - Implementation

1 Put an edge between nodes i and j if xi and xj are close.Precisely, given a parameter k ∈ N, put an edge between nodes iand j if xi is among the k nearest neighbors of xj or vice versa.

2 Given a parameter t > 0, if nodes i and j are connected, set

[Wt,n]i,j = e−‖xi−xj‖

2

4t .3 Set [Dt,n]i,i =

∑j [Wt,n]i,j , and let Lt,n = Dt,n −Wt,n. Solve

Lt,nf = λDt,nf , under the constraint y>Dt,ny = Id . Let f0, f1, . . . , fdbe d + 1 eigenvector solutions corresponding to the firsteigenvalues 0 = λ0 ≤ λ1 ≤ · · · ≤ λd . Discard f0 and use the nextd eigenvectors to embed in d-dimensional Euclidean spaceusing the map xi → (f1(i), f2(i), . . . , fd (i)).

Wojciech Czaja Harmonic Analysis and Big Data

Page 15: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Swiss Roll

Figure : a) Original, b) PCA in R2

A. Halevy, Extensions of Laplacian Eigenmaps for Manifold Learning, PhD. Thesis, University of Maryland College Park, 2011

Wojciech Czaja Harmonic Analysis and Big Data

Page 16: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Swiss Roll

Figure : a) Original, b) LE outputs in R2 for different choices of parameters

Image source: M. Belkin, P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation,

vol.15 no. 6 (2003), pp. 1373–1396.

Wojciech Czaja Harmonic Analysis and Big Data

Page 17: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Convergence of Laplacians

Given n data points x1, . . . , xn from a manifold MWe constructed an operator Lt,n which acts on a function fdefined on the data points as follows:

Lt,n(f )(xi ) =∑

j

f (xi )e−‖xi−xj‖

2

4t −∑

j

f (xj )e−‖xi−xj‖

2

4t

Our goal is to show that such defined operator is related in astrong sense to the Laplace-Beltrami operator on the manifold M.

Wojciech Czaja Harmonic Analysis and Big Data

Page 18: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Laplace operator and heat equation

Heat equation ∂∂t u(x , t) + ∆u(x , t) = 0, with u(x ,0) = f (x).

Solution given by u(x , t) = Ht (f )(x), where

Ht (f )(x) =

∫M

ht (x , y)f (y)dµ(y).

On Rk , we have ht (x , y) = (4πt)−k/2e−‖x−y‖2

4t .Key to approximation is the following equality

∆(f )(x) = − ∂

∂tu(x , t)

∣∣∣∣t=0

= − ∂

∂tHt (f )(x)

∣∣∣∣t=0

= limt→0

1t

(f (x)−Ht (f )(x)).

Wojciech Czaja Harmonic Analysis and Big Data

Page 19: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Key to approximation

∆(f )(x) = limt→0

1t

(4πt)−k/2(∫

Mf (x)e−

‖x−y‖2

4t dy −∫

Mf (y)e−

‖x−y‖2

4t dy)

∆(f )(x) = limt→0

1t

(4πt)−k/2

1n

∑j

f (xi )e−‖xi−xj‖

2

4t − 1n

∑j

f (xj )e−‖xi−xj‖

2

4t

=

1t

(4πt)−k/2 1n

Lt,n(f )(x).

Wojciech Czaja Harmonic Analysis and Big Data

Page 20: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

From discrete to continuous

Define Lt,n : C(M)→ C(M):

Lt,n(f )(x) =1t

(4πt)−k/2

1n

∑j

f (xi )e−‖xi−xj‖

2

4t − 1n

∑j

f (xj )e−‖xi−xj‖

2

4t

Define Lt : L2(M)→ L2(M):

Lt (f )(x) =1t

(4πt)−k/2(∫

Mf (x)e−

‖x−y‖2

4t dy −∫

Mf (y)e−

‖x−y‖2

4t dy)

Wojciech Czaja Harmonic Analysis and Big Data

Page 21: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

A few more notations

Laplace-Beltrami operator on the manifold M

∆M : C2(M)→ L2(M), ∆M(f ) = −div(∇f )

Heat kernel ht (x , y) : M ×M → RHeat operator Ht : L2(M)→ L2(M)

Ht (f )(x) =

∫M

ht (x , y)f (y)dy

Alternatively Ht (f ) = exp(−t∆M)(f )

Remainder Rt = 1t (Id − Ht )− Lt .

Wojciech Czaja Harmonic Analysis and Big Data

Page 22: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Pointwise convergence

Theorem (Pointwise Convergence - Belkin and Niyogi, 2007)

Let the data points x1, x2, . . . , xn be sampled independently from auniform distribution on a smooth compact manifold M ⊂ Rd . Letα > 0,and let tn = n−1/(k+2+α). For f ∈ C∞(M) we then have that:

limn→∞

Ltn,n(f )(x) =1

vol(M)∆M(f )(x)

in probability, where vol(M) is the volume of the manifold with respectto the canonical measure.M. Belkin, P. Niyogi, Towards a Theoretical Foundation for Laplacian-Based Manifold Methods, COLT 2005.M. Belkin, P. Niyogi, Towards a Theoretical Foundation for Laplacian-Based Manifold Methods, Journal of Computer and System Sciences,Vol. 74 no. 8 (2008), pp. 1289–308.

Wojciech Czaja Harmonic Analysis and Big Data

Page 23: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Overview of the proof

Usually we do not know the exact form of the heat kernel on themanifoldUsually we do not know the geodesic distance between points,only the distance in the ambient spaceHenceforth, our strategy is as follows:

Lt,n(f )(p) →n→∞ Lt (f )(p) →t→01

vol(M)∆M(f )(p).

Wojciech Czaja Harmonic Analysis and Big Data

Page 24: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Sample approximations

Fix f ∈ C∞(M), p ∈ M, t > 0.

Lt,n is nothing else but the sample average of n i.i.d. randomvariables with expectation E [Lt,n(f )(p)] = Lt (f )(p).

Lemma (Hoeffding’s Inequality)

Let X1,X2, . . . ,Xn be i.i.d. random variables such that |Xi | ≤ K . Then

P{∣∣∣∣∑i Xi

n− E [Xi ]

∣∣∣∣ > ε

}≤ 2 exp(−ε2n/2K 2)

Thus P[|Lt,n(f )(p)− Lt (f )(p)| > ε

]≤ 2 exp(−ε2nt2(4πt)k/2K 2)

Note that Hoeffding’s inequality yields a stronger result than,e.g., Chebyshev’s inequality.

W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association vol. 58 no. 301

(1963), pp. 13–30.

Wojciech Czaja Harmonic Analysis and Big Data

Page 25: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Proof of pointwise convergence

Let t = tn = n−1/(k+2+α), α > 0. Then for any ε > 0,

limn→∞

P[|Lt,n(f )(p)− Lt (f )(p)| > ε

]= 0.

Therefore, if we can show that the following holds:

Lt (f )(p) →t→01

vol(M)∆M(f )(p),

then we can finish the proof of the theorem.

Wojciech Czaja Harmonic Analysis and Big Data

Page 26: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Functional approximation

We can show the last approximation in 3 major steps:Reduce the integral to a ball B in M, i.e., show that

limt→0

Lt (f )(p) = limt→0

1t

(4πt)−k/2∫

B(f (p)− f (y))e−

‖p−y‖2

4y dµ(y).

Apply a change of coordinates using the exponential map torewrite the integral in Rk .Use a Taylor expansion to analyze the integral in Rk .

Wojciech Czaja Harmonic Analysis and Big Data

Page 27: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Step 1: Reduction to a ball

Let d = infx /∈B ‖p − x‖2 and m = µ(M \ B).Since B is open, we have that d > 0.Moreover,∣∣∣∣∫

Bf (y)e−

‖p−y‖2

4t dµ(y)−∫

Mf (y)e−

‖p−y‖2

4t dµ(y)

∣∣∣∣ ≤ m supx∈M|f (x)|e−d2/4t

Thus, the difference can be estimates as o(ta) for any a > 0.

Wojciech Czaja Harmonic Analysis and Big Data

Page 28: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Step 2: Change of coordinates

expp : TpM(∼ Rk )→ M, expp(0) = p.expp sends straight lines through the origin to geodesics

We can choose a small ball B ⊂ Rk such that expp is adiffeomorphism onto its image B ⊂ M.Rewrite f : M → Rk as f (x) = f (expp(x)).We thus obtain the following:

∆M(f )(p) = ∆(f )(0) = −∑

i

∂2 f∂x2

i(0).

Lemma (Beylkin, Niyogi, 2007)

There exists a constant C > 0 such that for any pair p, y ∈ M, wherey = expp(x), we have

0 ≤ ‖x‖2k − ‖y − p‖2

d = g(x) ≤ C‖x‖4k .

Wojciech Czaja Harmonic Analysis and Big Data

Page 29: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Step 2: Change of coordinates

1vol(M)

1t

(4πt)−k/2∫

B(f (0)− f (x))e−

‖p−expp (x)‖2

4t

√det(gi,j )dx

For the metric tensor g we have:

det(gi,j ) = 1 + O(‖x‖2)

1vol(M)

1t

(4πt)−k/2∫

B(f (0)− f (x))e−

‖x‖2−g(x)4t (1 + O(‖x‖2))dx

Wojciech Czaja Harmonic Analysis and Big Data

Page 30: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Analysis in Rk

The key component of the previous approximations is thefollowing term:

At =1

vol(M)

1t

(4πt)−k/2∫

B(f (0)− f (x))e−‖x‖

2/4tdx .

f (x)− f (0) = x∇f + 12 xT Hx + O(‖x‖3), where H denotes the

Hessian.Thus,At = − 1

vol(M)1t (4πt)−k/2

∫B(x∇f + 1

2 xT Hx + O(‖x‖3))e−‖x‖2/4tdx .

Note that∫

B xie−‖x‖2/4tdx = 0 and

∫B xixje−‖x‖

2/4tdx = 0 fori 6= j .Hence

vol(M) limt→0

At = −tr(H) = ∆M(f )(p),

which completes the proof of the point wise convergence.

Wojciech Czaja Harmonic Analysis and Big Data

Page 31: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Spectral convergence

We can also prove stronger results about spectral convergenceof Laplacian Eigenmaps.Eig(Lt,n) →n→∞ Eig(Lt ) →t→0 Eig(∆M)

Theorem (Spectral Convergence - von Luxburg, Bousquet, Belkin,2004)

For a fixed, sufficiently small t , let λit,n and λi

t denote the itheigenvalues of Lt,n and Lt , resp. Similarly, let ei

t,n and eit denote the

corresponding eigenvectors. If λit <

12t , then

limn→∞

λit,n = λi

t and limn→∞

‖eit,n − ei

t‖ = 0.

U. von Luxburg, O. Bousquet, M. Belkin, Limits of Spectral Clustering, NIPS 2004.

Wojciech Czaja Harmonic Analysis and Big Data

Page 32: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Spectral convergence

Theorem (Spectral Convergence - Belkin, Niyogi, 2008)

Let λi and λit denote the ith eigenvalues of ∆M and Lt , resp. Similarly,

let ei and eit denote their corresponding eigenvectors. Then

limt→0|λi − λi

t | = 0, limt→0‖ei − ei

t‖ = 0.

M. Belkin, P. Niyogi, Convergence of Laplacian Eigenmaps, NIPS 2008

Wojciech Czaja Harmonic Analysis and Big Data

Page 33: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Sketch of proof of spectral convergence

Denote the ith eigenvalue of Id−Htt by λi ((Id − Ht )/t)

Because Ht (f ) = exp(−t∆M)(f ), we have

λi ((Id − Ht )/t) =1− e−λ

i t

tand lim

t→0λi ((Id − Ht )/t) == λi

Eigenvectors are also equal: ei = ei ((Id − Ht )/t)We now need to show that for each fixed i and small t, the itheigenpair of Lt is close to that of Id−Ht

t

Wojciech Czaja Harmonic Analysis and Big Data

Page 34: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Sketch of proof of spectral convergence

Lemma (Belkin, Niyogi, 2008)

Let A,B be positive, self-adjoint operators with discrete spectrumarranged in increasing order. Let R = A− B and assume that for allf ∈ L2(X ), we have ∣∣∣∣ 〈R(f ), f 〉

〈A(f ), f 〉

∣∣∣∣ ≤ ε.Then, for all k, 1− ε ≤ λk (B)/λk (A) ≤ 1 + ε.

|〈A(f ), f 〉| ≤ |〈B(f ), f 〉|+ |〈R(f ), f 〉| ≤ |〈B(f ), f 〉|+ ε|〈A(f ), f 〉||〈A(f ), f 〉| ≥ |〈B(f ), f 〉| − |〈R(f ), f 〉| ≥ |〈B(f ), f 〉| − ε|〈A(f ), f 〉|(1− ε)|〈A(f ), f 〉| ≤ |〈B(f ), f 〉| ≤ (1 + ε)|〈A(f ), f 〉|If H is a k -dimensional subspace of L2(X ), then

(1− ε) maxH

minf∈H⊥

≤ maxH

minf∈H⊥

|〈B(f ), f 〉|

The result follows now from Courant-Fisher (min-max) theorem.

Wojciech Czaja Harmonic Analysis and Big Data

Page 35: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Sketch of proof of spectral convergence

Combining the previous lemma with the following theorem, completesthe proof of spectral convergence.

Theorem (Belkin, Niyogi, 2008)

For small enough t, there exists a constant C > 0, independent of t,such that

supf∈L2(M)

∣∣∣∣∣ 〈R(f ), f 〉|〈 Id−Ht

t (f ), f 〉

∣∣∣∣∣ ≤ Ct2

k+6 .

Wojciech Czaja Harmonic Analysis and Big Data

Page 36: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

From Laplacian to Schroedinger Eigenmaps

For the following considerations we fix t and n, hence we can discardfor the moment the subscript t ,n, without loss of generality.Consider the following minimization problem, y ∈ Rd ,

miny>Dy=Id

12

∑i,j

‖yi − yj‖2Wi,j = miny>Dy=E

tr(y>Ly).

Its solution is given by the d minimal non-zero eigenvalue solutions ofLf = λDf under the constraint y>Dy = Id .

Similarly, for diagonal α · V , α > 0, consider the problem

miny>Dy=Id

12

∑i,j

‖yi−yj‖2Wi,j +α∑

i

‖yi‖2Vi,i = miny>Dy=E

tr(y>(L+α ·V )y),

(1)which leads to solving equation (L + αV )f = λDf .

Wojciech Czaja Harmonic Analysis and Big Data

Page 37: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Schroedinger Eigenmaps

With M. Ehler, 2011Want to go from un-supervised to semi-supervised learningIn SE, replace L by L + V , where V is, e.g., a barrier potentialSchroedinger Eigenmaps allow for the use of labeled dataEnforce certain relations between the nodesAllow to utilize expert input or templates in otherwise fullyautomated techniques such as LEHave proven convergence results (with A. Halevy)

W. Czaja, M. Ehler, Schroedinger type eigenmaps for the analysis and classification of multispectral data in biomedical imaging, IEEETPAMI, vol. 35(5) (2013), pp. 1274–1280

A. Halevy, Extensions of Laplacian Eigenmaps for Manifold Learning, PhD. Thesis, University of Maryland College Park, 2011

Wojciech Czaja Harmonic Analysis and Big Data

Page 38: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Properties of Schroedinger Eigenmaps

Let the data graph be connected and let V be a symmetric positivesemi-definite matrix.

Theorem (with M. Ehler)

Let the data graph be connected, let V be a symmetric positivesemi-definite, and let n ≤ dim(Null(V )). Then the minimizer of (1)satisfies:

‖y (α)‖2V = trace2(y (α)T

Vy (α)) ≤ C1α.

In particular, if V = diag(v1, . . . , vN), then

vi‖y (α)i ‖

2 ≤N∑

i=1

vi‖y (α)i ‖

2 ≤ C11α, for all i = 1, . . . ,N.

Wojciech Czaja Harmonic Analysis and Big Data

Page 39: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Pointwise Convergence of SE

Given n data points x1, x2, . . . , xn sampled independently from auniform distribution on a smooth, compact, d-dimensional manifoldM⊂ RD, define the operator Lt,n : C(M)→ C(M) by

Lt,n(f )(x) =1

(4πt)d/2t

1n

∑j

f (x)e−‖x−xj‖

2

4t − 1n

∑j

f (xj )e−‖x−xj‖

2

4t

.

Let v ∈ C(M) be a potential. For x ∈M, let yn(x) = arg minx1,x2,...,xn

‖x − xi‖

and define Vn : C(M)→ C(M) by Vnf (x) = v(yn(x))f (x).

Theorem (Pointwise Convergence, with A. Halevy)

Let α > 0, and set tn = ( 1n )

1d+2+α . For f ∈ C∞(M),

limn→∞

Ltn,nf (x) + Vnf (x) = C∆Mf (x) + v(x)f (x) in probability.

Wojciech Czaja Harmonic Analysis and Big Data

Page 40: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Spectral Convergence of SE - Theorem

Let Lt,n be the unnormalized discrete Laplacian.

Theorem (Spectral Convergence of SE, with A. Halevy)

Let λit,n and ei

t,n be the ith eigenvalue and correspondingeigenfunction of Lt,n + Vn. Let λi and ei be the ith eigenvalue andcorresponding eigenfunction of ∆M + V. Then there exists asequence tn → 0 such that, in probability,

limn→∞

λitn,n = λi and lim

n→∞‖ei

tn,n − ei‖ = 0.

Wojciech Czaja Harmonic Analysis and Big Data

Page 41: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

SE as Semisupervised Method

−8 −6 −4 −2 0 2 4 6 8x 10−3

−8

−6

−4

−2

0

2

4

6

8x 10−3

−8 −6 −4 −2 0 2 4 6 8x 10−3

−8

−6

−4

−2

0

2

4

6

8x 10−3

−8 −6 −4 −2 0 2 4 6 8x 10−3

−8

−6

−4

−2

0

2

4

6

8x 10−3

−8 −6 −4 −2 0 2 4 6 8x 10−3

−8

−6

−4

−2

0

2

4

6

8x 10−3

The Schroedinger Eigenmaps with diagonal potentialV = diag(0, . . . ,0,1,0, . . . ,0) only acting in one point yi0 in the middleof the arc for α = 0.05,0.1,0.5,5. This point is pushed to zero.

Wojciech Czaja Harmonic Analysis and Big Data

Page 42: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

SE as Semisupervised Method

−8 −6 −4 −2 0 2 4 6 8x 10−3

−8

−6

−4

−2

0

2

4

6

8x 10−3

−8 −6 −4 −2 0 2 4 6 8x 10−3

−8

−6

−4

−2

0

2

4

6

8x 10−3

−8 −6 −4 −2 0 2 4 6 8x 10−3

−8

−6

−4

−2

0

2

4

6

8x 10−3

−8 −6 −4 −2 0 2 4 6 8x 10−3

−8

−6

−4

−2

0

2

4

6

8x 10−3

By applying the potential to the end points of the arc forα = 0.01,0.05,0.1,1, we are able to control the dimension reductionsuch that we obtain an almost perfect circle.

Wojciech Czaja Harmonic Analysis and Big Data

Page 43: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Hyperspectral data

20 40 60 80 100 120 140

20

40

60

80

100

120

140

50 100 150

100

200

300

400

500

600

(left) The Indian Pines image is a 145× 145 pixel image with 224spectral bands. It was acquired using an AVIRIS spectrometer. (right)The Pavia University image is a 610× 340 pixel image that contains115 spectral bands. It was acquired using a ROSIS sensor.Data courtesy of:Prof. Landgrebe (Purdue University, USA): the Indian Pines dataset,

Prof. Paolo Gamba (Pavia University, Italy): the Pavia University dataset.

Wojciech Czaja Harmonic Analysis and Big Data

Page 44: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Cluster Potentials

Let ι = {i1, . . . , im} be a collection of nodes. A cluster potential over ιis defined by:

V [ι, ι] :=

1 −1−1 2 −1

. . . . . . . . .−1 2 −1

−1 1

.

For such a V , the minimization problem is equivalent to:

minyT Dy=I

12

∑i,j

‖yi − yj‖2Wi,j + α

m−1∑k=1

‖yik − yik+1‖2.

Wojciech Czaja Harmonic Analysis and Big Data

Page 45: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Schroedinger Eigenmaps with Cluster Potentials

Use k-means clustering to cluster the hyperspectral image. Theclustering was initialized randomly so that the construction of thepotential would be completely unsupervised. The Euclideanmetric was used.Define a cluster potential, Vk over each of the k = 1, . . . ,Kclusters. The order of the pixels in each cluster is determined bytheir index (smallest to largest).Aggregate into one potential by summing the individual clusterpotentials:

VK =K∑

k=1

Vk .

Wojciech Czaja Harmonic Analysis and Big Data

Page 46: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Impact of SE on Cluster analysis

Pavia University: Dimensions 4 and 5 of the LE and SE embeddingsfor classes 2 (meadows), 3 (gravel), and 7 (bitumen)

J. J. Benedetto, W. Czaja, J. Dobrosotskaya, T. Doster, K. Duke, D. Gillis, Semi-supervised learning of heterogeneous data in remote sensing imagery,in Independent Component Analyses, Compressive Sampling, Wavelets, Neural Net, Biosystems, and Nanoengineering X,Proc. SPIE Vol. 8401 (2012), no. 8401-03

Wojciech Czaja Harmonic Analysis and Big Data

Page 47: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Impact of SE on Cluster analysis

Indian Pines: Dimensions 17 and 22 of the LE and SE embeddingsfor classes 2 (corn 1), 3 (corn 2), and 10 (soybeen 1)

Wojciech Czaja Harmonic Analysis and Big Data

Page 48: Harmonic Analysis and Big Data: Data Dependent Representations · Harmonic Analysis and Big Data: Data Dependent Representations Wojciech Czaja Institute for Mathematics and Its Applications

Summary and Conclusions

We have introduced several examples of data dependentrepresentation methods.These methods are well suited for the analysis of complex, noisy,high-dimensional data.As a drawback, we are going to encounter increasedcomputational requirements relative to fast a priori methods,such as FFT or DWT.Also, the nonlinear dimensionality reduction methods aretypically non-invertible.

Wojciech Czaja Harmonic Analysis and Big Data