ryan o'donnell (cmu) yi wu (cmu, ibm) yuan zhou (cmu)
DESCRIPTION
Optimal lower bounds for Locality Sensitive Hashing. (except when q is tiny). Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU). Locality Sensitive Hashing [Indyk-Motwani '98]. h :. objects. sketches. H : family of hash functions h s.t. - PowerPoint PPT PresentationTRANSCRIPT
Locality Sensitive Hashing [Indyk-Motwani '98]
objects sketchesh :
H : family of hash functions h s.t.
“similar” objects collide w/ high prob.
“dissimilar” objects collide w/ low prob.
Min-wise hash functions [Broder '98]
||
||
BA
BA
A
Bw
ord
1?w
ord
2?w
ord
3?
wor
d d?
Jaccard similarity:
Invented simple H s.t. Pr [h(A) = h(B)] =
0 1 1 1 0 0 1 0 0
1 1 1 0 0 0 1 0 1
Indyk-Motwani '98
Defined LSH.
Invented very simple H good for
{0, 1}d under Hamming distance.
Showed good LSH implies good
nearest-neighbor-search data structs.
Charikar '02, STOC
Proposed alternate H (“simhash”) for
Jaccard similarity.
Patented by .GoogleGoogle
Practice Theory
Free code base [AI’04]
Sequence comparisonin bioinformatics
Association-rule findingin data mining
Collaborative filtering
Clustering nouns bymeaning in NLP
Pose estimation in vision
• • •
[Tenesawa–Tanaka ’07]
[Broder ’97]
[Indyk–Motwani ’98]
[Gionis–Indyk–Motwani ’98]
[Charikar ’02]
[Datar–Immorlica– –Indyk–Mirrokni ’04]
[Motwani–Naor–Panigrahi ’06]
[Andoni–Indyk ’06]
[Neylon ’10]
[Andoni–Indyk ’08, CACM]
Given: (X, dist), r > 0, c > 1
distance space “radius” “approx factor”
Goal: Family H of functions X → S
(S can be any finite set)
s.t. ∀ x, y ∈ X,
≥ p
≤ q
≥ q.5 ≥ q.25 ≥ q.1 ≥ qρ )]()([Pr),(~
yhxhyxdisth
H
≤ r
)]()([Pr),(~
yhxhyxdisth
H
≥ cr
Theorem
[IM’98, GIM’98]
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size: O(n1+ρ)
query time: Õ(nρ) hash fcn evals.
qyhxh
cryxdist
qyhxh
ryxdist
h
h
)]()([Pr
),(
)]()([Pr
),(
~
~
H
H
qyhxh
cryxdist
qyhxh
ryxdist
h
h
)]()([Pr
),(
)]()([Pr
),(
~
~
H
H
Example
X = {0,1}d, dist = Hamming
r = εd, c = 5
0 1 1 1 0 0 1 0 0
1 1 1 0 0 0 1 0 1
dist ≤ εd or ≥5εd
H = { h1, h2, …, hd }, hi(x) = xi[IM’98]
“output a random coord.”
51)]()([Pr5),(
1)]()([Pr),(
~
~
yhxhdyxdist
yhxhdyxdist
h
h
H
H
Analysis
= q
= qρ
(1 − 5ε)1/5 ≈ 1 − ε. ∴ ρ ≈ 1/5
(1 − 5ε)1/5 ≤ 1 − ε. ∴ ρ ≤ 1/5
In general, achieves ρ ≤ 1/c, ∀c (∀r).
Optimal upper bound
( {0, 1}d, Ham ), r > 0, c > 1.
S ≝ {0, 1}d ∪ {✔}, H ≝ {hab : dist(a,b) ≤ r}
hab(x) = ✔ if x = a or x = b
x otherwise
= 0
positive=> 0.5 > 0.1 > 0.01 > 0.0001 )]()([Pr),(~
yhxhyxdisth
H
≤ r
)]()([Pr),(~
yhxhyxdisth
H
≥ cr
Wait, what?Theorem [IM’98, GIM’98]
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size: O(n1+ρ)
query time: Õ(nρ) hash fcn evals.
Wait, what?Theorem [IM’98, GIM’98]
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size: O(n1+ρ)
query time: Õ(nρ) hash fcn evals.
q ≥ n-o(1) ("not tiny")
More results
For Rd with ℓp-distance:
when p = 1, 0 < p < 1, p = 2[IM’98] [DIIM’04] [AI’06]
For Jaccard similarity: ρ ≤ 1/c
pc
1
[Bro’98]
For {0,1}d with Hamming distance:
−od(1) (assuming q ≥ 2−o(d))[MNP’06]
immediately
for ℓp-distance
c
462.
pc
462.
Our Theorem
For {0,1}d with Hamming distance:
−od(1) (assuming q ≥ 2−o(d))
immediately
for ℓp-distance
(∃ r s.t.)
Proof also yields ρ ≥ 1/c for Jaccard.
c
1
pc
1
Definition: Noise stability at e-т
Fix any arbitrary function h : {0,1}d → S.
Pick x ∈ {0,1}d at random:
x = h(x) = s
Flip each bit w.p. (1-e-2т)/2 independently
y = h(y) = s’
def:
0 1 1 1 0 0 1 0 0
0 0 1 1 0 0 1 1 0
)]()([Pr)(~
yhxhyx
h K
Lemma 1:
Lemma 2:
For x y,
when τ ≪ 1.
Kh(τ) is a log-convex function of τ.
(for any h)
τ
dist(x, y) = o(d) w.v.h.p. 2/)1( 2 de
d≈
Theorem: LSH for {0,1}d requires .)1(1
doc
Proof: Chernoff bound and Taylor expansion.
Proof: Fourier analysis of Boolean functions.
0 τ
log Kh(τ)
Proof: Say H is an LSH family for {0,1}d
with params (εd + o(d), cεd - o(d), qρ, q) .
r (c − o(1)) r
def: (Non-neg. lin. comb.
of log-convex fcns.
∴ KH(τ) is also
log-convex.)
w.v.h.p.,
dist(x,y) ≈ (1 - e-т)d ≈ тd ∴ KH(ε) ≳ qρ
KH(cε) ≲ q
)]([E)(~
hhKK
HH
)]]()([Pr[E~~
yhxhyxh
H
in truth, q+2−Θ(d); we assume q not tiny
)]]()([Pr[E~~
yhxhhyx
H
∴ KH(ε) ≳
KH(cε) ≲
∴ KH(0) = ln
ln
ln
1
qρ
q
0
ρ ln q
ln q
KH(τ) is log-convex
0 τ
ln KH(τ)
cε
ln q
ε
∴
ln q1c
ρ ln q ≤ ln q1c