a nonlinear approach to dimension reduction

30
A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer

Upload: mason

Post on 07-Feb-2016

63 views

Category:

Documents


1 download

DESCRIPTION

A Nonlinear Approach to Dimension Reduction. Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Data As High-Dimensional Vectors. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction

Lee-Ad Gottlieb

Weizmann Institute of Science

Joint work with Robert Krauthgamer

Page 2: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 2

Data As High-Dimensional Vectors Data is often represented by vectors in Rm

For images, color or intensity For document, word frequency

A typical goal – Nearest Neighbor Search: Preprocess data, so that given a query vector, quickly find closest

vector in data set. Common in various data analysis tasks – classification, learning,

clustering.

Page 3: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 3

Curse of Dimensionality Cost of many useful operations is exponential in dimension

First noted by Bellman (Bel-61) in the context of PDFs Nearest Neighbor Search (Cla-94)

Dimension reduction: Represent high-dimensional data in a low-dimensional space

Specifically: Map given vectors into a low-dimensional space, while preserving most of the data’s “structure”

Trade-off accuracy for computational efficiency

Page 4: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 4

The JL Lemma Theorem (Johnson-Lindenstrauss, 1984):

For every n-point Euclidean set X, with dimension d, there is a linear map : XY (Euclidean Y) with Interpoint distortion 1± Dimension of Y : k = O(--2 log n)

Can be realized by a trivial linear transformation Multiply d x n point matrix by a k x d matrix of random entries {-1,0,1} [Ach-01]

An near matching lower bound was given by [Alon-03]

Applications in a host of problems in computational geometry

But can we do better?

Page 5: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 5

Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x.

The doubling constant (of a metric M) is the minimum value ¸ such that every ball can be covered by ¸ balls of half the radius First used by [Ass-83], algorithmically by [Cla-97]. The doubling dimension is dim(M)=log ¸(M) [GKL-03]

Applications: Approximate nearest neighbor search [KL-04,CG-06] Distance oracles [HM-06] Spanners [GR-08a,GR-08b] Embeddings [ABN-08,BRS-07]

Here ≤7.

Page 6: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 6

The JL Lemma Theorem (Johnson-Lindenstrauss, 1984):

For every n-point Euclidean set X, with dimension d, there is a linear map : XY with Interpoint distortion 1± Dimension of Y : O(-2 log n)

An almost matching lower bound was given by [Alon-03] This lower bound considered n roughly equidistant points

So it had dim(X) = log n So in fact the lower bound is (-2 dim(X))

Page 7: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 7

A stronger version of JL? Open questions:

Can the JL log n lower bound be strengthened to apply to spaces with low doubling dimension? (dim(X) << log n)

Does there exist a JL-like embedding into O(dim(X)) dimensions? [LP-01,GKL-03] Even constant distortion would be interesting A linear transformation cannot attain this result [IN-07]

Here, we present a partial resolution to these questions: Two embeddings that use Õ(dim2(X)) dimensions Result I: (1±) embedding for a single scale, interpoint distances close to

some r. Result II: (1±) global embedding into the snowflake metric, where every

interpoint distance s is replaced by s½

Page 8: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 8

Result I – Embedding for Single Scale Theorem 1 [GK-09]:

Fix scale r>0 and range 0<<1. Every finite X½l2 admits embedding f:Xl2

k for k=Õ(log(1/)(dim X)2), such that

1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X

2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y||2 [r, r]

3. Boundedness: ||f(x)|| ≤ r for all x2X

We’ll illustrate the proof for constant range and distortion.

Page 9: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 9

distance: 1

Result I: The construction We begin by considering the entire point set. Take for example

scale r=20 range = ½ Assume minimum interpoint distance 1

Page 10: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 10

Step 1: Net extraction From the point set, we extract a net

For example, a 4-net Net properties:

Covering Packing

A consequence of the packing property is that a ball of radius s contains O(sdim(X)) points

Covering radius: 4

Packing distance: 4

Page 11: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 11

Step 1: Net extraction We want a good embedding for just the net points

From here on, our embedding will ignore non-net points Why is this valid?

Page 12: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 12

Step 1: Net extraction Kirszbraun theorem (Lipschitz extension, 1934):

Given an embedding f : XY , X ½ S (Euclidean space) there exists a extension f ’ : S Y

The restriction of f ’ to X is equal to f f ’ is contractive for S \ X

Therefore, a good embedding just for the net points suffices Smaller net radius less distortion for the non-net points

f ’

2020

Page 13: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 13

Step 2: Padded decomposition Decompose the space into probabilistic padded clusters

Page 14: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 14

Step 2: Padded decomposition Decompose the space into probabilistic padded clusters

Cluster properties for a given random partition [GKL03,ABN08]: Diameter: bounded by 20 dim(X)

Size: By the doubling property, bounded (20 dim(X))dim(X) Padding: A point is 20-padded with probability 1-c, say 9/10 Support: O(dim(X)) partitions

≤ 20 dim(X)

Padded

Page 15: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 15

Step 3: JL on individual clusters For each partition, consider each individual cluster

Page 16: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 16

Step 3: JL on individual clusters For each partition, consider each individual cluster

Reduce dimension using JL-Lemma Constant distortion Target dimension:

logarithimic in size: O(log(20 dim(X))dim(X)) = Õ(dim(X)) Then translate some point to the origin

JL

Page 17: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 17

The story so far… To review

Step 1: Extract net points Step 2: Build family of partitions Step 3: For each partition, apply JL to each cluster, and translate a

cluster point to the origin

Embedding guarantees for

a singe partition Intracluster distance: Constant distortion Intercluster distance:

Min distance: 0 Max distance: 20 dim(X)

Not good enough Let’s backtrack…

Page 18: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 18

The story so far… To review

Step 1: Extract net points Step 2: Build family of partitions Step 3: For each partition, apply Gaussian transform to each cluster Step 4: For each partition, apply JL to each cluster, and translate a

cluster point to the origin

Embedding guarantees for

a singe partition Intracluster distance: Constant distortion Intercluster distance:

Min distance: 0 Max distance: 20 dim(X)

Not good enough Let’s backtrack…

Page 19: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 19

Step 3: Gaussian transform For each partition, apply the Gaussian transform to distances

within each cluster (Schoenberg’s theorem, 1938) f(t) = (1-e-t2)1/2

Threshold at s:

fs(t) = s(1-e-t2/s2)1/2

Properties for s=20: Threshold: Cluster diameter is at most 20 (Instead of 20dim(X)) Distortion: Small distortion of distances in relevant range

Transform can increase dimension… but JL is the next step

Page 20: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 20

Step 4: JL on individual cluster Steps 3 & 4:

New embedding guarantees Intracluster: Constant distortion Intercluster:

Min distance: 0 Max distance: 20 (instead of 20dim(X))

Caveat: Also smooth the edges

JLGaussian

smaller diameter smaller dimension

Page 21: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 21

Step 5: Glue partitions We have an embedding for a single partition

For padded points, the guarantees are perfect For non-padded points, the guarantees are weak

“Glue” together embeddings for all dim(X) partitions Concatenate images (and scale down)

Non-padded case occurs 1/10 of the time, so it gets “averaged away” Final dimension for non-net points:

Number of partitions: O(dim(X)) dimension of each embedding: Õ(dim(X)) = Õ (dim2(X))

f1(x) = (1,7,2), f2(x) = (5,2,3), f3(x) = (4,8,5)

F(x) = f1(x) f2(x) f3(x) = (1,7,2,5,2,3,4,8,5)

Page 22: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 22

Kirszbraun’s theorem extends embedding to non-net points within increasing dimension

Step 6: Kirszbraun extension theorem

Embedding

Embedding + K.

Page 23: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 23

Result I – Review Steps:

Net extraction Padded Decomposition Gaussian Transform JL Glue partitions Extension theorem

Theorem 1 [GK-09]: Every finite X½l2 admits embedding f:Xl2

k for k=Õ((dim X)2), such that

1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X

2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y||2 [r, r]

3. Boundedness: ||f(x)|| ≤ r for all x2X

Page 24: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 24

Result I – Extension Steps:

Net extraction nets Padded Decomposition Larger padding, prob. guarantees Gaussian Transform JL Already (1±) Glue partitions Higher percentage of padded points Extension theorem

Theorem 1 [GK-09]: Every finite X½l2 admits embedding f:Xl2

k for k=Õ((dim X)2), such that

1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X

2. Gaussian at scale r: ||f(x)-f(y)|| ≥(1±)G(||x-y||) whenever ||x-y||2 [r, r]

3. Boundedness: ||f(x)|| ≤ r for all x2X

Page 25: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 25

Result II – Snowflake Embedding Theorem 2 [GK-09]:

For 0<<1, every finite subset X½l2 admits an embedding F:Xl2k for

k=Õ(-4(dim X)2) with distortion (1±) to the snowflake: s s½

We’ll illustrate the construction for constant distortion. The constant distortion construction is due to [Asouad-83] (for non-

Euclidean metrics) In the paper, we implement the same construction with (1±) distortion

Page 26: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 26

Snowflake embedding Basic idea.

Fix points x,y 2X, and suppose ||x-y|| ~ s Now consider many single scale embeddings

r = 16s r = 8s r = 4s r = 2s r = s r = s/2 r = s/4 r = s/8 r = s/16

x y

Lipschitz: ||f(x)-f(y)|| ≤ ||x-y||

Gaussian: ||f(x)-f(y)|| ≥(1±)G(||x-y||)

Boundedness: ||f(x)|| ≤ r

Page 27: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 27

Snowflake embedding Now scale down each embedding by r½ (snowflake)

r = 16s s s½/4 r = 8s s s½/8½ r = 4s s s½/2 r = 2s s s½/2½ r = s s s½

r = s/2 s/2 s½/2½ r = s/4 s/4 s½/2 r = s/8 s/8 s½/8½ r = s/16 s/16 s½/4

Page 28: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 28

Snowflake embedding Join levels by concatenation and addition of coordinates

r = 16s s s½/4 r = 8s s s½/8½ r = 4s s s½/2 r = 2s s s½/2½ r = s s s½

r = s/2 s/2 s½/2½ r = s/4 s/4 s½/2 r = s/8 s/8 s½/8½ r = s/16 s/16 s½/4

Page 29: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 29

Result II – Review Steps:

Take collection of single scale embeddings Scale embedding r by r½

Join embeddings by concatenation and addition

By taking more refined scales (jump by 1± instead of 2), can achieve (1±) distortion to the snowflake

Theorem 2 [GK-09]: For 0<<1, every finite subset X½l2 admits an embedding F:Xl2

k for k=Õ(-4(dim X)2) with distortion (1±) to the snowflake: s s½

Page 30: A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction 30

Conclusion Gave two (1±) distortion low-dimension embeddings for

doubling spaces Single scale Snowflake

This framework can be extended to L1 and L∞

Dimension reduction: Can’t use JL Extension: Can’t use Kirszbraun Threshold: Can’t use the Gaussian

Thank you!