locality sensitive hashing

17
Locality Sensitive Hashing Randomized Algorithm

Upload: yasanka-sameera-horawalavithana

Post on 09-Aug-2015

147 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Locality sensitive hashing

Locality Sensitive HashingRandomized Algorithm

Page 2: Locality sensitive hashing

Problem Statement

• Given a query point q,• Find closest items to the query

point with the probability of

• Iterative methods?• Large volume of data• Curse of dimensionality

Page 3: Locality sensitive hashing

Taxonomy – Near Neighbor Query (NN)

NN

Trees

K-d Tree Range Tree B Tree Cover Tree

Grid

Voronoi Diagram

Hash

ApproximateLSH

Page 4: Locality sensitive hashing

Approximate LSH

• Simple Idea• if two points are close together, then after a “projection” operation these two

points will remain close together

Page 5: Locality sensitive hashing

LSH Requirement

• For any given points

• Hash function h is (, ) sensitive, Ideally we need• to be large• to be small

Page 6: Locality sensitive hashing

Pd

2d

c.d

q

q

P(1)

P(2)

P(c) P(1) P(2) P(3)

q

Page 7: Locality sensitive hashing

Probability vs. Distance on candidate pairs

Page 8: Locality sensitive hashing

Hash Function(Random)

• Locality-preserving• Independent• Deterministic• Family of Hash Function per various distance measures• Euclidean• Jaccard• Cosine Similarity• Hamming

Page 9: Locality sensitive hashing

LSH Family for Euclidean distance (2d)

• When ,• Chance of colliding• But not certain

• But can guarantee,• If ,

• to have

• If ,• 0 to have

• As LSH (, ) sensitive

Page 10: Locality sensitive hashing

How to define the projection?

• Scalar projection (Dot product)

Page 11: Locality sensitive hashing

How to define the projection?

• K-dot product, that

points at different separations will fall into the same quantization bin

• Perform k independent dot products• Achieve success,• if the query and the nearest neighbor are in the same bin in all k dot products• Success probability = decreases as we include more dot products

Page 12: Locality sensitive hashing

Multiple-projections

• L independent projections• True near neighbor will be unlikely to be unlucky in all the projections

• By increasing L,• we can find the true nearest neighbor with arbitrarily high probability

Page 13: Locality sensitive hashing

Accuracy

• Two close points p and q,• Separated by • Probability of collision ,

- probability density function of H

• As distance u increases, decreases

Page 14: Locality sensitive hashing

Time complexity

• For a query point q,• To Find the near neighbor: (+)

• Calculate & hash the projections ()

• Search the bucket for collisions ()• O(DL); D-dimension, L projections, and• where ; - expected number of collisions for single projection

• Analyze• increases as k & L increase• decreases as k increases since

Page 15: Locality sensitive hashing

How many projections(L)?

• For query point p & neighbor q,• For single projection,

• Success probability of collisions: • For L projections,

• Failure probability of collisions:

Page 16: Locality sensitive hashing

LSH in MAXDIVREL Diversity

#1 #2 #3 … #k dot product

1 1 0 0 .. 1

2 0 1 1 … 1

w 0 0 1 … 0

#1 #2 #3 … #k dot product

1 1 1 0 .. 1

2 1 0 1 … 1

w 0 1 1 … 0

#1 #2 #3 … #k dot product

1 1 0 1 .. 0

2 0 0 1 … 0

w 0 1 0 … 0

#1 #2 #3 … #k dot product

1 1 0 0 .. 1

2 0 1 1 … 1

w 0 0 1 … 0

Page 17: Locality sensitive hashing

REFERENCES

[1] Anand Rajaraman and Jeff Ullman, “Chapter Three of ‘Mining of Massive Datasets,’” pp. 72–130.[2] M. Slaney and M. Casey, “Lecture Note: LSH,” 2008.[3] N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey, “Streaming similarity search over one billion tweets using parallel locality-sensitive hashing,” Proc. VLDB Endow., vol. 6, no. 14, pp. 1930–1941, Sep. 2013.