algorithmic high-dimensional geometry 2
DESCRIPTION
Algorithmic High-Dimensional Geometry 2. Alex Andoni ( Microsoft Research SVC). The NNS prism. High dimensional geometry. dimension reduction. NNS. space partitions. small dimension. embedding. sketching. …. Small Dimension. What if is small?. Can solve approximate NNS with - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/1.jpg)
Algorithmic High-Dimensional Geometry 2
Alex Andoni(Microsoft Research SVC)
![Page 2: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/2.jpg)
The NNS prism
High dimensionalgeometry dimension reduction
space partitions
embedding
…
NNS
small dimension
sketching
![Page 3: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/3.jpg)
Small Dimension
![Page 4: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/4.jpg)
What if is small? Can solve approximate NNS with
space query time [AMNSW’98,…]
OK, if say !
Usually, is not small…
![Page 5: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/5.jpg)
What if is small? Eg:
-dimensional subspace of , with Obviously, extract subspace and solve NNS there! Not a robust definition…
More robust definitions: KR-dimension [KR’02] Doubling dimension [Assouad’83, Cla’99, GKL’03,
KL’04] Smooth manifold [BW’06, Cla’08] other [CNBYM’01, FK97, IN’07, Cla’06…]
“effectively”
![Page 6: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/6.jpg)
Doubling dimension Definition: pointset has doubling dimension if:
for any point , radius , consider ball of points within distance of
can cover by balls , … Sanity check:
-dimensional subspace has n points always have dimension at most
Can be defined for any metric space!
𝑟
𝑝
balls to cover
![Page 7: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/7.jpg)
NNS for small doubling dimension Euclidean space [Indyk-Naor’07]
JL into dimension “works” ! Contraction of any pair happens with very small
probability Expansion of some pair happens with constant probability Good enough for NNS!
Arbitrary metric Navigating nets/cover trees [Krauthgamer-Lee’04, Har-
Peled-Mendel’05, Beygelzimer-Kakade-Langford’06,…] Algorithm:
A data-dependent tree: recursive space partition using balls At query , follow all paths that intersect with the ball
![Page 8: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/8.jpg)
Embeddings
![Page 9: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/9.jpg)
General Theory: embeddings General motivation: given distance (metric)
, solve a computational problem under
Euclidean distance (ℓ2)
Hamming distance
Edit distance between two strings
Earth-Mover (transportation) Distance
Compute distance between two points
Diameter/Close-pair of set S
Clustering, MST, etc
Nearest Neighbor Search
𝑓
Reduce problem <under hard metric> to <under simpler metric>
![Page 10: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/10.jpg)
Embeddings: landscape Definition: an embedding is a map of a metric into a host
metric such that for any :
where is the distortion (approximation) of the embedding .
Embeddings come in all shapes and colors: Source/host spaces Distortion Can be randomized: with probability Time to compute
Types of embeddings: From norm to the same norm but of lower dimension (dimension
reduction) From one norm () into another norm () From non-norms (edit distance, Earth-Mover Distance) into a norm () From given finite metric (shortest path on a planar graph) into a norm
() not a metric but a computational procedure sketches
![Page 11: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/11.jpg)
Earth-Mover Distance Definition:
Given two sets of points in a metric space = min cost bipartite matching between and
Which metric space? Can be plane, …
Applications in image vision
Images courtesy of Kristen Grauman
![Page 12: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/12.jpg)
Embedding EMD into At least as hard as Theorem [Cha02, IT03]: Can embed EMD over
into with distortion . Time to embed a set of points: .
Consequences: Nearest Neighbor Search: approximation with
space, and query time. Computation: approximation in time
Best known: approximation in time [SA’12] The higher-dimensional variant is still fastest via
embedding [AIK’08]
![Page 13: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/13.jpg)
High level embedding
Sets of size in box Embedding of set :
take a quad-tree randomly shift it Each cell gives a coordinate: =#points in the cell
Need to prove
13
2 21 0
02 11
1
0 00
0 0 0
0
0 221
…2210… 0002…0011…0100…0000…
…1202… 0100…0011…0000…1100…
![Page 14: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/14.jpg)
Main Approach Idea: decompose EMD
over []2 into EMDs over smaller grids
Recursively reduce to =O(1)
+≈
![Page 15: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/15.jpg)
EMD over small grid Suppose =3
f(A) has nine coordinates, counting # points in each joint f(A)=(2,1,1,0,0,0,1,0,0) f(B)=(1,1,0,0,2,0,0,0,1)
Gives O(1) distortion
![Page 16: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/16.jpg)
Decomposition Lemma [I07]
/k
k
For randomly-shifted cut-grid G of side length k, we have: EEMD(A,B) ≤ EEMDk(A1, B1) + EEMDk(A2,B2)+…
+ k*EEMD/k(AG, BG) EEMD(A,B) [ EEMDk(A1, B1) + EEMDk(A2,B2)+… ]
EEMD(A,B) [ k*EEMD/k(AG, BG) ]
The distortion willfollow by applying the lemmarecursively to (AG,BG)
lower boundon cost
upperbound
![Page 17: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/17.jpg)
Part 1: lower bound For a randomly-shifted cut-grid G of side length k, we
have: EEMD(A,B) ≤ EEMDk(A1, B1) + EEMDk(A2,B2)+…
+ k*EEMD/k(AG, BG)
Extract a matching from the matchings on right-hand side
For each aA, with aAi, it is either: matched in EEMD(Ai,Bi) to some bBi
or aAi\Bi, and it is matched
in EEMD(AG,BG) to some bBj
Match cost in 2nd case: Move a to center ()
paid by EEMD(Ai,Bi)
Move from cell i to cell j paid by EEMD(AG,BG)
/k
k
![Page 18: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/18.jpg)
Parts 2 & 3: upper bound For a randomly-shifted cut-grid G of side length k, we have:
EEMD(A,B) [ EEMDk(A1, B1) + EEMDk(A2,B2)+… ]
EEMD(A,B) [ k*EEMD/k(AG, BG) ]
Fix a matching minimizing EEMD(A,B) Will construct matchings for each EEMD on RHS
Uncut pairs (a,b) are matched in respective (Ai,Bi) Cut pairs (a,b) are matched
in (AG,BG) and remain unmatched in their
mini-grids
![Page 19: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/19.jpg)
Part 2: Cost? EEMD(A,B) [ ∑i EEMDk(Ai, Bi)] Uncut pairs (a,b) are matched in respective
(Ai,Bi) Contribute a total ≤ EEMD (A,B)
Consider a cut pair (a,b) at distance Contribute ≤ 2k to ∑i EEMDk(Ai, Bi) Pr[(a,b) cut] = Expected contribution ≤ Pr[(a,b) cut] In total, contribute EEMD (A,B)
dx
k
![Page 20: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/20.jpg)
Wrap-up of EMD Embedding In the end, obtain that
EMD(A,B) sum of EMDs of smaller grids in expectation
Repeat times to get to grid approximation it total!
![Page 21: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/21.jpg)
Embeddings of various metrics into
Metric Upper bound
Earth-mover distance(-sized sets in 2D plane)
[Cha02, IT03]
Earth-mover distance(-sized sets in ) [AIK08]
Edit distance over (#indels to transform x->y) [OR05]
Ulam (edit distance between permutations) [CK06]
Block edit distance[MS00, CM07]
edit(1234567, 7123456) = 2
edit( banana ,
ananas ) = 2
![Page 22: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/22.jpg)
Non-embeddability into
Metric Upper bound
Earth-mover distance(-sized sets in 2D plane)
[Cha02, IT03]
Earth-mover distance(-sized sets in ) [AIK08]
Edit distance over (#indels to transform x->y) [OR05]
Ulam (edit distance between permutations) [CK06]
Block edit distance[MS00, CM07]
Lower bounds
[NS07]
[KN05]
[KN05,KR06]
[AK07]
4/3[Cor03]
![Page 23: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/23.jpg)
Non-embeddability proofs Via Poincaré-type inequalities… [Enflo’69]: embedding into (any dimension)
must incur distortion Proof [Khot-Naor’05]
Suppose is the embedding of into Two distributions over pairs of points :
C: for random and index F: are random
Two steps:
(short diagonals) Implies lower bound!
![Page 24: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/24.jpg)
Other good host spaces?
What is “good”: is algorithmically tractable is rich (can embed into it)
sq-ℓ2=real space with distance: ||x-y||22
sq-ℓ2, hosts with very good LSH (lower bounds via communication complexity)̃�
[AK’07]
[AK’07]
[AIK’08]
sq- ,etc
Metric Lower bound into
Edit distance over [KN05, KR06]
Ulam (edit distance between permutations) [AK07]
Earth-mover distance(-sized sets in ) [KN05]
???
![Page 25: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/25.jpg)
The story [Mat96]: Can embed any metric on points into
Theorem [I’98]: NNS for with approximation space, query time
Dimension is an issue though… Smaller dimension?
Possible for some: Hausdorff,… [FCI99] But, not possible even for [JL01]
![Page 26: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/26.jpg)
Other good host spaces?
What is “good”: algorithmically tractable rich (can embed into it)
But: combination sometimes works!
sq- ,etc
![Page 27: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/27.jpg)
Meet our new host
Iterated product space
27
[A-Indyk-Krauthgamer’09]
d∞,1
d1
… β
α
γ
d1
…
d∞,1
d1
…
d∞,1d22,∞,1
sq −ℓ2𝛾 (ℓ∞𝛽 (ℓ1𝛼 ) )
![Page 28: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/28.jpg)
Why ?
Because we can… Embedding: …embed Ulam into with
constant distortion dimensions = length of the string
NNS: Any t-iterated product space has NNS on n points with approximation near-linear space and sublinear time
Corollary: NNS for Ulam with approx. Better than via each component separately!
edit distance between permutationsED(1234567, 7123456) = 2
[Indyk’02, A-Indyk-Krauthgamer’09]
Ric
hA
lgori
thm
ically
tract
able
![Page 29: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/29.jpg)
Sketching
![Page 30: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/30.jpg)
Computational view
Arbitrary computation Cons:
No/little structure (e.g., not metric) Pros:
More expressability: may achieve better distortion (approximation) smaller “dimension”
Sketch : “functional compression scheme” for estimating distances almost all lossy ( distortion or more) and
randomized
xy
F
dM (x , y)≈√∑𝑖=1
𝑘
(𝐹 𝑖(𝑥)−𝐹 𝑖(𝑦 ))2𝐶 (𝐹 (𝑥 ) ,𝐹 (𝑦 ))
F(x)
F(y)
{0,1 }𝑘 {0,1 }𝑘× {0,1 }𝑘ℜ
![Page 31: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/31.jpg)
Why?
31
1) Beyond embeddings: can more do if “embed” into computational space
2) A waypoint to get embeddings: computational perspective can give actual
embeddings
3) Connection to informational/computational notions communication complexity
![Page 32: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/32.jpg)
Beyond Embeddings: “Dimension reduction” in ! Lemma [I00]: exists linear , and
where achieves: for any , with probability :
Where with each distributed from Cauchy distribution (-stable distribution)
While does not have expectation, it has median!
𝑝𝑑𝑓 (𝑠 )= 1
𝜋(𝑠2+1)
![Page 33: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/33.jpg)
Waypoint to get embeddings
Embedding of Ulam metric into was obtained via “geometrization” of an algorithm/characterization: sublinear (local) algorithms: property testing
& streaming [EKKRV98, ACCL04, GJKK07, GG07, EJ08]
X
Y
sum (ℓ1)
max (ℓ∞)
sum of squares (sq-ℓ2) edit(X,Y)
![Page 34: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/34.jpg)
Ulam: algorithmic characterization
Lemma: Ulam(x,y) approximately equals the number of “faulty” characters a satisfying: there exists K≥1 (prefix-length) s.t. the set of K characters preceding a
in x differs much from the set of K characters preceding a in y
123456789
123467895
Y[5;4]
X[5;4]
x=
y=
E.g., a=5; K=4[Ailon-Chazelle-Commandur-Lu’04, Gopalan-Jayram-Krauthgamer-Kumar’07, A-Indyk-Krauthgamer’09]
![Page 35: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/35.jpg)
Connection to communication complexity
35
Enter the world of Alice and Bob…
shared randomness
Referee
𝑥 𝑦sketch() sketch()
Communication complexity model:
Two-party protocol Shared randomness Promise (gap) version c = approximation ratio CC = min. # bits to decide (for
90% success)CC bits
…
decide whether: or
Sketching model: Referee decides based on
sketch(x), sketch(y) SK = min. sketch size to
decide
Fact: SK ≥ CC
![Page 36: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/36.jpg)
Communication Complexity
36
VERY rich theory [Yao’79, KN’97,…] Some notable examples:
are sketchable with bits! [AMS’96,KOR’98] hence also everything than embeds into it! is tight [IW’03,W’04, BJKS’08,CR’12] requires bits [BJKS’02] Coresets: sketches of sets of points for geometric problems
[AHV04…] Connection to NNS:
[KOR’98]: if sketch size is , then NNS with space and one memory lookup!
From the perspective of NNS lower bounds, communication complexity closer to ground truth
Question: do non-embeddability result say something about non-sketchability? also Poincaré-type inequalities… [AK07,AJP’10]
Connections to streaming: see Graham Cormode’s lecture
![Page 37: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/37.jpg)
High dimensional geometry
???
![Page 38: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/38.jpg)
Closest Pair
Problem: n points in d-dimensional Hamming space, which are random except a planted pair at distance ½-
Solution 1: build NNS and query times LSH-type algo would give ~ [PRR89,IM98,D08]
Theorem [Valiant’12]: time
38
p1
p2
pn
p1
p2
pn
p1+p2
= M
Find max entry ofMMt
using subcubic MM algorithms
![Page 39: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/39.jpg)
What I didn’t talk about: Too many things to mention
Includes embedding of fixed finite metric into simpler/more-structured spaces like
Tiny sample among them: [LLR94]: introduced metric embeddings to TCS. E.g. showed
can use [Bou85] to solve sparsest cut problem with approximation
[Bou85]: Arbitrary metric on points into , with distortion [Rao99]: embedding planar graphs into , with distortion [ARV04,ALN05]: sparsest cut problem with approximation [KMS98,…]: space partition for rounding SDPs for coloring Lots others…
A list of open questions in embedding theory Edited by Jiří Matoušek + Assaf Naor:
http://kam.mff.cuni.cz/~matousek/metrop.ps
![Page 40: Algorithmic High-Dimensional Geometry 2](https://reader035.vdocuments.net/reader035/viewer/2022062517/56812eae550346895d945132/html5/thumbnails/40.jpg)
High dimensional geometry via NNS prism
High dimensionalgeometry dimension reduction
space partitions
embedding
NNS
small dimension
sketching
+++