efficient method for maximizing bichromatic reverse nearest neighbor

1

Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor

Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)

M. Tamer Ozsu (University of Waterloo)Philip S. Yu (University of Illinois at Chicago)

Ada Wai-Chee Fu (Chinese University of Hong Kong)Lian Liu (Hong Kong University of Science and Technology)

Presented by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong

2

Outline

1. Introduction Related work – Bichromatic Reverse

Nearest Neighbor

2. Problem - MaxBRNN3. Algorithm - MaxOverlap 4. Empirical Study5. Conclusion

3

1. Introduction

Bichromatic Reverse Nearest Neighbor (BRNN or RNN) Given

P and O are two sets of points in the same data space

Problem Given a point pP, a BRNN query finds all the

points oO whose nearest neighbor (NN) in P are p.

4

1. Introduction NN: Nearest neighborRNN: Reverse nearest neighbor

NN in P = p1

P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5

NN in P = p1

NN in P = p2

NN in P = p2

NN in P = p2

RNN = {o1, o2}

RNN = {o3, o4 , o5}

5


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5

Suppose that we want to set upa new convenience store p

Where should we set up?

p

RNN = {o1, o2}

Placement 1

Placement 1 RNN = {o1, o2}

Influence value = 2

2

6


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5



p

RNN = {o1, o2 , o3, o4 , o5}

Different placements of p may havedifferent RNN sets

Placement 2


Placement 2 RNN = {o1, o2 , o3, o4 , o5}

Which placement is better?

Placement 2

2

5

Influence value = 5

7


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5



p

RNN = {o1, o2 , o3, o4 , o5}

Different placements of p may havethe same RNN set

Placement 3




2

5

5

Influence value = 5

8


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5



Problem: We want to find a region R(or area) such that when p is placedin R, the influence value of p is maximized.




2

5

5

9

1. Introduction Related Work

Arrangement Running Time = O(|O| log |P| + |O|2 +2(|O|))

where (|O|) is a function on |O| and is (|O|)

Our Proposed Algorithm MaxOverlap Running Time = O(|O| log |P| + k2 |O| +k |O| log |O|)

where k << |O| Significant improvement

on Running Time Problem: We want to find a region R(or area) such that when p is placedin R, the influence value of p is maximized.

10

2. ProblemNN: Nearest neighborRNN: Reverse nearest neighbor

P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


p

RNN = {o1, o2 , o3, o4 , o5}

11


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


Consistent region

For any two possible placements in this region, their RNN sets are the same

Influence value = 5

RNN = {o1, o2 , o3, o4 , o5}

p

12


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


13


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


p

RNN = {o1, o2}

14


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


Non-Consistent region

RNN = {o1, o2 , o3, o4 , o5}

p

15


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


Consistent region

16


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


Consistent region

Many consistent regions!

Influence value = 5

17


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


Maximal consistent regionThere does not exist another consistent region R’ where (1) R’ covers R and (2) the RNN sets of R and R’ are equal

Influence value = 5

18


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5


Maximal consistent regionThere does not exist another consistent region R’ where (1) R’ covers R and (2) the RNN sets of R and R’ are equal

Maximal consistent region

Influence value = 5

19

p1

p2

o1

o2

o3

o4

o5


Maximal consistent region

2. Problem

Problem: We want to find a maximalconsistent region R such that when the influence value of R is maximized.

We call this problemMaximizing Bichromatic Reverse Nearest Neighbor(MaxBRNN)

20

p1

p2

o1

o2

o3

o4

o5

2. Problem



Two challenges:

Challenge 1: It is difficult to find a maximal consistent region

Challenge 2: We need to return the maximal consistent region with the greatest influence value

21

2. Problem



Two challenges:



Nearest location circle (NLC)

P = {p1, p2}

O = {o1, o2}

Convenience stores

Customers

p1

p2

o1

o2

NN in P = p1Construct a circle centered at o1

with radius |p1, o1|

NN in P = p2

Construct a circle centered at o2

with radius |p2, o2|

22

2. Problem



Two challenges:




P = {p1, p2}

O = {o1, o2}

Convenience stores

Customers

p1

p2

o1

o2

A

23

2. Problem



Two challenges:




P = {p1, p2}

O = {o1, o2}

Convenience stores

Customers

p1

p2

o1

o2

AB

24

2. Problem



Two challenges:




P = {p1, p2}

O = {o1, o2}

Convenience stores

Customers

p1

p2

o1

o2

AB

C

25

2. Problem



Two challenges:




P = {p1, p2}

O = {o1, o2}

Convenience stores

Customers

p1

p2

o1

o2

AB

C

D

26

2. Problem



Two challenges:




P = {p1, p2}

O = {o1, o2}

Convenience stores

Customers

p1

p2

o1

o2

AB

C

D

RNN set = {o1, o2}

RNN set = {o1}

RNN set = {o2}

RNN set = {}

Four maximal consistent regions

Solution: Region A

21 1

0

Intersection between two NLCs

27

2. Problem



Two challenges:




p1

p2

o1

o2

AB

C

D

Four maximal consistent regions

Solution: Region A

Lemma: The solution of MaxBRNN can berepresented by an intersection ofmultiple nearest location circles.

Intersection between two NLCs

28

2. Problem



Two challenges:



We propose an algorithm called MaxOverlap

29

3. Algorithm

Make use of the principle of region-to-point transformation

Optimal Region Search Problem

Optimal Point Search Problem

1. Search a limited number of points

2. Find the optimal point

This optimal point can be mapped to

the optimal region in Optimal Region

Search Problem

30

3. Algorithm

o1

o2

o3

o4

o5

o6p

5

p4

p3

p2

p1

P = {p1, p2 , p3 , p4 , p5}

O = {o1, o2, o3 , o4, o5 , o6}

Convenience stores

Customers

31

3. Algorithm

o1

o2

o3

o4

o5

o6p

5

p4

p3

p2

p1

32

3. Algorithm

o1

o2

o3

o4

o5

o6

The maixmal consistent region which maximizes the RNN set

NLC c1

NLC c2

NLC c3

Intersection of c1, c2 and c3

Intersection of c1, c2 and c3Solution

33

3. Algorithm

Algorithm MaxOverlap Three-Step Algorithm

34

3. Algorithm

o1

o2

o3

o4

o5

o6

Step 1 (Finding Intersection Point)

35

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4

Step 1 (Finding Intersection Point)

36

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4

Step 2 (Point Query)

Point query for q4

37

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Point query for q4

Result for q4 = { }c1

38

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Point query for q4

Result for q4 = { }c1 , c3

39

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Point query for q3


40

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Point query for q3


Result for q3 = { }c1

41

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Point query for q3


Result for q3 = { }c1, c2

42

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Point query for q3


Result for q3 = { }c1, c2, c3

43

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Point query for q1




44

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4





…

Point query for q5


45


3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4

Step 3 (Finding Maximum Size)




…

The intersection of c1, c2 and c3 corresponds to the solution.



46

3. Algorithm Theorem: The running time of

algorithm MaxOverlap is

O(|O| log |P| + k2|O| + k |O| log |O|)

where k is typically much smaller than |O|

47

3. Algorithm

Enhancement 1: We process the intersection points q in a pre-defined order

Enhancement 2: Step 2 and Step 3 can be combined We introduce a pruning technique

such that some intersection points will not be processed.

48

4. Empirical Study Synthetic Dataset

P: Gaussian distribution O: Zipfian distribution

Real Dataset Rtree Portal

http://www.rtreeportal.org/spatial.html CA (62,556) LB (53,145) GR (23,268) GM (36,334)

P: one of the above datasets O: one of the above datasets

49

4. Empirical Study Measurements

Execution Time Storage

Our proposed algorithms MaxOverlap-P

MaxOverlap with Pruning MaxOverlap-NP

MaxOverlap without pruning Comparison with adapted algorithms

Arrangement Buffer-Adapt

50

4. Empirical Study Small dataset

51

4. Empirical Study Large dataset

52

5. Conclusion

Problem MaxBRNN Maximizing Bichromatic Reverse

Nearest Neighbor Algorithm MaxOverlap

Theoretical Analysis of Running Time Significant Improvement on Running

Time Experiments

53

Q&A

54

1. Introduction Arrangement

Idea: search regions in the space There are an exponential number of

regions Our algorithm MaxOverlap

Idea: make use of the principle of region-to-point transformation



1. Search a limited number of points

2. Find the optimal point

This optimal point can be mapped to

the optimal region in Optimal Region

Search Problem

55


We are studying problem MaxBRNN over the L2-norm (i.e., Euclidean distance)

Buffer Studies problem MaxBRNN over the L1-norm (i.e., Manhattan

distance) Buffer-Adapt

Heuristic algorithm over the L2-norm Our algorithm MaxOverlap runs faster than Buffer-

Adapt

56


Buffer-Adapt Running Time = O(|O| log |P| + L2k |O| + L2 |O| log |O|)

where L > k

Our Proposed Algorithm MaxOverlap Running Time = O(|O| log |P| + k2 |O| +k |O| log |

O|)where k << |O|

Significant improvementon Running Time


57

1. Introduction Applications

Traditional applications which exist in BRNN also applied to MaxBRNN

Service Location Planning Convenience store, supermarket, coffee shop, ATM,

wireless routers, … Profile-based Marketing Natural disaster (e.g., earthquake in China)

Place supply/service centers for rescue or relief jobs Big event (e.g., US presidential campaign)

Place police force for security Military

Place some temporary depots for gasoline and food

58


P = {p1, p2}

O = {o1, o2, o3 , o4, o5}

Convenience stores

Customers

p1

p2

o1

o2

o3

o4

o5







2

5

5

There are many arbitrary regions!

Consistent region

Many consistent regions!

59

3. Algorithm

If the solution is the region with the influence value = 1, any one of NLCs is the solution

o2

o1p

1

p2o

3

p3

60

3. Algorithm

If the solution is the region with the influence value = 1, any one of NLCs is the solution

If the solution is the region with the influence value > 1, finding this solution is more

complicated

61

3. Algorithm

o1

o2

o3

o4

o5

o6


62

3. Algorithm

Step 1 (Finding Intersection Point) We find a set Q of all intersection points between the

boundaries of any two overlapping NLCs Step 2 (Point Query)

For each point q in Q, we perform a point query for q to find a set S of NLCs covering q

Step 3 (Finding Maximum Size) We choose set S with the largest size obtained in the above step

63

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4

This region contains at least one intersection point (e.g., q1, q3 and q5)

Intersection point betweenthe boundaries of two NLCs

Observation 1


64

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1


Suppose we perform a point query for q1 to find all NLCs covering q1

Result = { }c1

65

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1



Result = { }c1, c2

66

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1



Result = { }c1, c2, c3

Same set of NLCs

67

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1



Result = { }c1

68

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1



Result = { }c1, c2

69

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1




Same set of NLCs

70

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1



Result = { }c1

71

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1



Result = { }c1, c2

72

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1




Same set of NLCs

Observation 2

The result of the point query foreach of these intersection pointscan be used to denote thefinal solution

73

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


Observation 1

Observation 2

The result of the point query foreach of these intersection pointscan be used to denote thefinal solution



74

3. Algorithm

o1

o2

o3

o4

o5

o6

q8 q

9

q7

q6

q2

q3

q1

q5

q4


75

3. Algorithm

o1

o2

o3

o4

o5

o6

Intersection of c1, c2 and c3SolutionNLC L(c)

c1 c2, c3

c2 c1, c3

c3 c1, c2, c4

c4 c3

c5 c6

c6 c5

Overlap table2

2

3

1

1

1

c3 overlaps with more NLCs

We process c3 first becauseit is more likely that the solution involves c3

q3

q4

Point query for q4


Temp. Solution {c1, c3}

76

3. Algorithm

o1

o2

o3

o4

o5

o6


c1 c2, c3

c2 c1, c3

c3 c1, c2, c4

c4 c3

c5 c6

c6 c5

Overlap table2

2

3

1

1

1



q3

q4



Point query for q3


{c1, c2 , c3}

77

3. Algorithm

o1

o2

o3

o4

o5

o6


c1 c2, c3

c2 c1, c3

c3 c1, c2, c4

c4 c3

c5 c6

c6 c5

Overlap table2

2

3

1

1

1



q3

q4




{c1, c2 , c3}

q2

q1

Point query for q1


…

78

3. Algorithm

o1

o2

o3

o4

o5

o6


c1 c2, c3

c2 c1, c3

c3 c1, c2, c4

c4 c3

c5 c6

c6 c5

Overlap table2

2

3

1

1

1

q3

q4




{c1, c2 , c3}

q2

q1


…

Suppose we have processed c1 and c2

79

3. Algorithm

o1

o2

o3

o4

o5

o6


c1 c2, c3

c2 c1, c3

c3 c1, c2, c4

c4 c3

c5 c6

c6 c5

Overlap table2

2

3

1

1

1

q3

q4




{c1, c2 , c3}

q2

q1


…

Suppose we want to process c5

3

c5 overlaps at most 1 NLC

Any intersection points from c5 havethe influence value at most 2

Pruning: Since 2 < 3, we do NOT need to process c5.

Lemma: Let I be the influence value found so far.Consider an NLC c.If |L(c)|+1 < I, we do not need to process c.

80

4. Empirical Study

Nearest Location Circle (NLC) Construction Involve NN queries over P Build R*-tree on P

Step 2 Involve Point Queries over NLC Build R*-tree on all NLCs

efficient method for maximizing bichromatic reverse nearest neighbor

Documents