finding the closest pair in the query region › sites › default › files › ...finding the...

COMP4905 Honours Project

Winter2010

Finding the Closest Pair in the Query Region

Author: Yan Gao Student#: 100750466

Email: [email protected] Supervisor: Dr. Michiel Smid

Department: School of Computer Science Date: April 15, 2010

ABSTRACT

This project demonstrates the implementation of a c program to construct a data structure for

finding the closest pair points in a query region. To achieve this goal, the algorithm of finding the

closest points in plane and the data structure of range tree are used. The project contains two

parts, the first to build the data structure of the closest points, and the second to search in the

region. Then we test the running time of each part to measure the data structure’s performance.

Acknowledgment

This honours project was done under the supervision of Dr. Michiel Smid. I would like to extend

special thanks to him for all of the help he has given me, and all I have learned from him while

working with him.

Thanks to Hsin-Yi Chiang in the project programming and Benjamin C. Yan in the writing of the

report.

1

Contents

List of figures ................................................................................................................. 2

List of tables ................................................................................................................... 3

1. Introduction ................................................................................................................ 4

2. Algorithms.................................................................................................................. 6

2.1 Converting the edges to the tree....................................................................... 6

2.2 Range Search Tree ........................................................................................... 8

2.2.1 Build the Range Search Tree................................................................. 8

2.2.2 1-Dimensional Range Searching ......................................................... 10

2.2.3 2-Dimensional Range Searching ......................................................... 11

2.3 Compute the graph ......................................................................................... 14

3. Program .................................................................................................................... 18

3.1 Main function implementation ....................................................................... 18

3.2 Important functions ........................................................................................ 21

4. Running time ............................................................................................................ 27

4.1 Convert the graph ........................................................................................... 27

4.2 Build the range search tree ............................................................................. 29

4.3 Build the data structure .................................................................................. 31

4.4 Query time ..................................................................................................... 33

5. Conclusion ............................................................................................................... 35

6. References ................................................................................................................ 37

2

List of figures

Figure 1 Finding the closest pair in the query region ............................................ 5

Figure 2 The relationship between edges and the query region ............................. 6

Figure 3 Edge convert rule ..................................................................................... 7

Figure 4 Convert the edges to weighted points ...................................................... 8

Figure 5 Define the range on 1D............................................................................ 8

Figure 6 1D range tree ........................................................................................... 9

Figure 7 1D-tree range search .............................................................................. 10

Figure 8 2D range tree ......................................................................................... 12

Figure 9 2D-tree range search .............................................................................. 13

Figure 10 Grid the plane ...................................................................................... 14

Figure 11 The relation between cells and query points........................................ 15

Figure 12 Compute the min-edges ....................................................................... 16

Figure 13 Main process of the program ............................................................... 19

Figure 14 Data structures used in the program .................................................... 20

Figure 15 Convert the tree to an array ................................................................. 23

Figure 16 Function: treeRoot() ............................................................................ 24

Figure 17 Relations between the indexes and the tree nodes ............................... 24

Figure 18 Find the left and right children of each internal node ......................... 25

Figure 19 Function: min_weight() ....................................................................... 26

Figure 20 Running time of converting the graph ................................................. 29

Figure 21 Running time of build y-trees .............................................................. 30

Figure 22 Running time of building tree.............................................................. 31

Figure 23 Running time of building the data structure ........................................ 32

Figure 24 Running time of 1000 query points ..................................................... 34

Figure 25 2 query points region searching ........................................................... 35

3

List of tables

Table 1 Running time of convert the graph ....................................................... 29

Table 2 Running time of building tree ............................................................... 31

Table 3 Running time of building the data structure ......................................... 32

Table 4 Running time of 1000 query points ...................................................... 34

4

1. Introduction

Finding the closest pair is a famous computational geometry problem. It means given n

points in some K-dimension space, find two points with the shortest distance between

them.[1]

A range-tree is a data structure for the organization of points, and is useful in

multidimensional key searching problems such as range searching. It’s a special case of a

binary search tree. [2]

This project discusses a problem in a 2-dimension space: given a plane with x and y

coordinate range, n points in the plane with corresponding coordinates, a random point in

the plane, and the query region being the upper right area of the point, find the closest pair

of points in the query region. As shown in Fig1, the light blue region is the query region;

the red line should be output as the shortest edge. What was done in this project was to

build a data structure such that when the query region is given, the data structure can be

searched and output the answer.

5

Figure 1 Finding the closest pair in the query region

To build the data structure, the problem is partitioned into two parts: first, to convert the

original points graph to all possible shortest edges in the plane; second, to build a data

structure for the query searching. For the first part, we calculate all possible shortest edges

base on the “planer case” algorithm [1], its running time is O(n log n), which is better than

brute-force algorithm’s running time O(n2). For the second part of the problem, we use the

2D range tree to store the possible edges (the smallest distance between two points). Then

the data structure can be searched by x and y coordinates.

Based on the algorithms, a C program was written to implement the data structure, test the

algorithms running time in the real program, and verify the actual time versus the predicted

running time. Analysis of the data structure is helpful for the query searching.

In this paper, we introduce the algorithms chosen in this project, and discuss some

important functions used in the program, then analyze the running time of building the data

structure and searching for the query.

6

2. Algorithms

2.1 Converting the edges to the tree

In order to tackle the problem of finding the closest pair, the first step is to convert the

edges to the tree. Assume we have a graph of edges in a plane. For each edge, the

coordinates of its two ends are known. Given a query region, the shortest edge will be

output. For example, the plane S with x ∈ 0, RANGE , y ∈ [0, RANGE], with RANGE is

a positive integer. Generate a random query point q = (xq , yq), the query region is the

area with x ∈ [xq , RANGE], y ∈ [yq , RANGE].

Figure 2 The relationship between edges and the query region

In Fig.2, the edges are categorized into cases based on the border of the query region:

7

inside, outside and crossing. Only the inside edges are satisfied. When we want to know an

edge’s position of the query region, we need to check the two ends of the edge are both

inside the query region. In fact, when the data structure is built, the two end positions for

each edge should be stored in the memory. If the data structure is used for just one query

region, the time to check the two points is not too long. But when the data structure is

used for randomly query regions, the running time and the memory will double.

Therefore, we use a better method to get the edges we want.

For any arbitrary edge 𝑝𝑞 , as shown in Fig.3, it can be grouped either in case (a) or case (b).

Given 𝑝(𝑝𝑥 ,𝑝𝑦), 𝑞(𝑞𝑥 ,𝑞𝑦), get a point r 𝑟𝑥 , 𝑟𝑦 , which 𝑟𝑥 = min 𝑝𝑥 ,𝑞𝑥 , 𝑟𝑦 =

𝑚in(𝑝𝑦 , 𝑞𝑦 ). If the edge is like case(a), r is the right angle of the triangle; if the edge is like

case(b), r is the left bottom end. Therefore, we can convert the edges to points, each points

has a weight equal to the length of the edge. So Fig.2 is transferred to Fig.4.

Figure 3 Edge convert rule

8

Figure 4 Convert the edges to weighted points

Therefore, we convert the finding minimum edge length problem to the finding minimum

weight problem. For the latter problem, we can use the range search tree algorithm.

2.2 Range Search Tree

The range search tree is for searching the minimum distance of all possible distances. It

bases on the binary search tree data structure, and implemented in two dimensions. When

we get the query, we use it as the range for searching.

2.2.1 Build the Range Search Tree

Assume we have a set of points 𝑝1,𝑝2,𝑝3,⋯ , 𝑝𝑛 in 1-dimension, then for each point, it

has a value called weight. Given a region [u, v], we want to report the minimum weight of

the points in the region.

Figure 5 Define the range on 1D

9

If we use brute-force algorithm to find the region, it takes O(n) time, but it can’t be used in

higher dimensions. So we choose the binary search tree data structure, the leaves of the tree

T store the points of P and the internal nodes of T store splitting values to guide the search.

We denote the splitting value stored at a node v by 𝑥𝑣. We assume that the left subtree of a

node contains all the points smaller than or equal to𝑥𝑣, and that the right subtree contains

all the points strictly greater than 𝑥𝑣. [1]

Assuming we have a set of points P, each element in P has a coordinate value and a weight,

and all the elements are sorted by coordinate values (from small to large). For example, if:

P := {(1:20), (2:13), (3:15), (4:7), (5:32), (6:11),(7:8),(8:22)},

then we can get the binary search tree as shown in Fig.6:

Figure 6 1D range tree

Each leaf is the original node in the set P, the upper number indicates the coordinate value,

10

and the bottom number is the weight of the point, which comes with the point. For the

internal nodes, the upper number is the splitting value, and the bottom number is the

minimum weight of its subtree. In Fig.6, we also know all splitting values are from the

original points.

2.2.2 1-Dimensional Range Searching

Base on the algorithm of Berg [3]. Given the tree and a query region 𝑢, ∞ ), we want to

report the minimum weight of the points in the region.

Figure 7 1D-tree range search

First, we search for the split node, which is the root of the smallest subtree from 𝑢, ∞ ).

FINDSPLITNODE (𝑻,𝒖)

𝐼𝑛𝑝𝑢𝑡: 𝑎 𝑡𝑟𝑒𝑒 𝑻 𝑎𝑛𝑑 𝑎 𝑞𝑢𝑒𝑟𝑦 𝒖

𝑂𝑢𝑡𝑝𝑢𝑡: 𝑡h𝑒 𝑛𝑜𝑑𝑒 𝒗 𝑤h𝑒𝑟𝑒 𝑡h𝑒 𝑝𝑎𝑡h 𝑡𝑜 𝑢 𝑎𝑛𝑑 𝑡h𝑒 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑛𝑜𝑑𝑒 𝑠𝑝𝑙𝑖𝑡.

1. 𝑣 ← 𝑟𝑜𝑜𝑡(𝑇)

2. 𝒊𝒇 𝑢 > max 𝑛𝑜𝑑𝑒

3. 𝒓𝒆𝒕𝒖𝒓𝒏 𝒆𝒓𝒓𝒐𝒓

4. 𝒘𝒉𝒊𝒍𝒆 𝑣 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑙𝑒𝑎𝑓 𝑎𝑛𝑑 𝑢 ≥ 𝑥𝑣

5. 𝒅𝒐 𝑣 ← 𝑟𝑖𝑔h𝑡𝐶h𝑖𝑙𝑑(𝑣)

6. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑣

11

If the query node is in the range of the tree, we can find a path from the split node to the

query leaf. For each internal node on the path, we should check the direction of the next

step, until we get the end of the path, and report the minimum weight of all collected

weights. For example, if we go to its left child, we collect its right child’s weight, because

its subtree is in the query region.

1DRANGESEARCH (𝑻,𝒖)

𝐼𝑛𝑝𝑢𝑡: 𝑎 𝑡𝑟𝑒𝑒 𝑻, 𝑎 𝑞𝑢𝑒𝑟𝑦 𝒖

𝑂𝑢𝑡𝑝𝑢𝑡: 𝑡h𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑤𝑒𝑖𝑔h𝑡 𝑖𝑛 𝑡h𝑒 𝑞𝑢𝑒𝑟𝑦 𝑟𝑒𝑔𝑖𝑜𝑛

1. 𝑣 ← 𝐹𝐼𝑁𝐷𝑆𝑃𝐿𝐼𝑇𝑁𝑁𝑂𝐷𝐸(𝑇,𝑢)

2. 𝒊𝒇 𝑣 𝑖𝑠 𝑎 𝑙𝑒𝑎𝑓

3. 𝒕𝒉𝒆𝒏 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣𝑤𝑒𝑖𝑔ℎ𝑡

4. 𝒆𝒍𝒔𝒆(∗ 𝐹𝑜𝑙𝑙𝑜𝑤 𝑡h𝑒 𝑝𝑎𝑡h 𝑡𝑜 𝑢 ∗)

5. 𝒘𝒉𝒊𝒍𝒆 𝑣 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑙𝑒𝑎𝑓

6. 𝒅𝒐 𝒊𝒇 xv ≥ 𝑢

7. 𝒕𝒉𝒆𝒏 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣 ′𝑠 𝑟𝑖𝑔h𝑡𝐶h𝑖𝑙𝑑𝑤𝑒𝑖𝑔ℎ𝑡

8. 𝑣 ← 𝑙𝑒𝑓𝑡 𝑐h𝑖𝑙𝑑(𝑣)

9. 𝒆𝒍𝒔𝒆 𝑣 ← 𝑟𝑖𝑔h𝑡 𝑐h𝑖𝑙𝑑(𝑣)

10. 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣𝑤𝑒𝑖𝑔ℎ𝑡

11. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑤𝑒𝑖𝑔h𝑡𝑠

2.2.3 2-Dimensional Range Searching

Base on the algorithm of Berg [3]. If the points are in the plane that we want to get the

shortest distance in the query region, we need a 2-dimensional range search tree.

The set P is all points in the plane. The main tree is a range search tree T built on the

x-coordinate of the points in P. For any node v in T, the subset P(v) is the points stored in

the subtree with root v. Build the range search tree Tassoc (v) (we can also call it y-tree) on

the y-coordinate of the points P(v), and node v knows how to get the root of Tassoc (v).

12

Figure 8 2D range tree

BUILD2DRANGETREE(P)

Input: a set P of points in the plane

Output: the root of a 2D range search tree

1. 𝐵𝑢𝑖𝑙𝑑𝑅𝑎𝑛𝑔𝑒𝑇𝑟𝑒𝑒 𝑻 𝑏𝑦 𝑥 − 𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒

2. 𝑭𝒐𝒓 𝑒𝑎𝑐h 𝑛𝑜𝑑𝑒 𝒗 𝑖𝑛 𝑻

3. 𝑃 𝑣 ← 𝑡h𝑒 𝑠𝑢𝑏𝑡𝑟𝑒𝑒 𝑜𝑓 𝒗

4. 𝐵𝑢𝑖𝑙𝑑 𝑅𝑎𝑛𝑔𝑒𝑇𝑟𝑒𝑒 𝑇𝑎𝑠𝑠𝑜𝑐 (𝑣)

5. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑡h𝑒 𝑟𝑜𝑜𝑡 𝑜𝑓 𝑻

Once 2D range tree is built, we can use the algorithm 2DRangeSearch to find the minimum

weight in the query rectangle region. First, we search for the points which x-coordinates

are greater than or equal to the query region. Therefore, we can get the split node and the

path to query x in the 2D tree. Then we start from the split node, if the next step goes to the

left, we get the right child’s y-tree and do 1Drange search, until the query leaf node is

reached.

13

Figure 9 2D-tree range search

2DRangeSearch 𝑻,𝒒

𝐼𝑛𝑝𝑢𝑡: 𝑎 2𝐷 𝑡𝑟𝑒𝑒 𝑻,𝑎 𝑞𝑢𝑒𝑟𝑦 𝑝𝑜𝑖𝑛𝑡 𝒒

𝑂𝑢𝑡𝑝𝑢𝑡: 𝑡ℎ𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑤𝑒𝑖𝑔ℎ𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑞𝑢𝑒𝑟𝑦 𝑟𝑒𝑔𝑖𝑜𝑛

1. 𝑣 ← 𝐹𝐼𝑁𝐷𝑆𝑃𝐿𝐼𝑇𝑁𝑂𝐷𝐸(𝑇, 𝑞𝑥)


3. 𝒕𝒉𝒆𝒏 𝒊𝒇 𝑣𝑦 ≥ 𝑞𝑦 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣𝑤𝑒𝑖𝑔ℎ𝑡

4. 𝒆𝒍𝒔𝒆 (∗ 𝐹𝑜𝑙𝑙𝑜𝑤 𝑡ℎ𝑒 𝑝𝑎𝑡ℎ 𝑡𝑜 𝑞𝑥 ∗)

5. 𝒘𝒉𝒊𝒍𝒆 𝑣 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑙𝑒𝑎𝑓

6. 𝒅𝒐 𝒊𝒇 𝑣𝑥 ≥ 𝑞𝑥

7. 𝒕𝒉𝒆𝒏 𝑤 ← 𝑟𝑖𝑔ℎ𝑡𝐶ℎ𝑖𝑙𝑑(𝑣)

8. 1𝐷𝑅𝑎𝑛𝑔𝑒𝑆𝑒𝑎𝑟𝑐ℎ( 𝑤𝑦−𝑡𝑟𝑒𝑒 , 𝑞𝑦 )

9. 𝑣 ← 𝑙𝑒𝑓𝑡𝐶ℎ𝑖𝑙𝑑(𝑣)

10. 𝒆𝒍𝒔𝒆 𝑣 ← 𝑟𝑖𝑔ℎ𝑡𝐶ℎ𝑖𝑙𝑑(𝑣)

11. 𝒊𝒇 𝑣𝑦 ≥ 𝑞𝑦 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣𝑤𝑒𝑖𝑔ℎ𝑡

12. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑤𝑒𝑖𝑔ℎ𝑡𝑠

14

2.3 compute the graph

Based on the algorithms above, we can compute the shortest edge in the query region by

steps,

1. convert the set of edges E into set of points P with weight (weight = distance between

two original points)

2. build 2D range tree T for P

3. search the minimum weight point in T in the query region.

This algorithm introduces how to compute the graph of edges.

First, we have a set P of all points in the plane S, each point has two values which indicate

x-coordinate value and y-coordinate. We make a horizontal line and a vertical line for each

point, and then get an n × n grid (n is the number of the points). The query points are

located in the plane, so it must be in one of the n2 cells.

Figure 10 Grid the plane

15

If we look into one cell that contains the query point, and if the query point in the shaded

area (except the left and bottom side), the query region contains the same points, meaning:

If query u1 gets S1’, query u2 gets S2’, query u3 gets S3’,

then S1’ = S2’ = S3’.

Figure 11 The relation between cells and query points

By Fig.11, we can find the shaded area of the grid; and if the query point is in the grid, the

result is always same as the left-bottom corner of the grid. So we can get at most n2 results.

If there are some points in the plane sorted by x-coordinate, get the minimum distance from

right to left. We know the minimum distance d in points [i+1, n-1] already, when move to

the next left point p, we just check whether the points whose x-coordinate distance with p is

smaller than d, then choose the smaller one as the new d.

16

Figure 12 Compute the min-edges

MINEDGE( P, d)

𝐼𝑛𝑝𝑢𝑡: 𝑎 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑃,𝑎 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑒𝑑𝑔𝑒 𝑑

𝑂𝑢𝑡𝑝𝑢𝑡: 𝑡h𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑒𝑑𝑔𝑒

1. 𝒊𝒇 (𝑝1 .𝑥 − 𝑝0 . 𝑥) ≥ 𝑑

2. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑑

3. 𝒘𝒉𝒊𝒍𝒆 (𝑝𝑖 . 𝑥 − 𝑝0 .𝑥 ≤ 𝒅 & 𝑖 ≤ 𝑠𝑖𝑧𝑒 𝑃 )

4. 𝑑′ ← 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑝𝑖 ,𝑝0

5. 𝒊𝒇 𝑑′ < 𝑑

6. 𝐭𝐡𝐞𝐧 d ← 𝑑′

7. 𝑖 + +;

8. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑑

If we can find the minimum edge in an x-sorted array, we can compute the graph with both

x-coordinate and y-coordinate. First sort the original points by y-coordinate and get the

subset from the end to the beginning of the set. Then sort the subset by x-coordinate, get its

subset from the right side to the left. Compute each subset’s minimum edge, add to an array

and remove duplicates at the same time.

17

COMPUTEGRAPH(P)

𝐼𝑛𝑝𝑢𝑡: 𝑎 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑷

𝑂𝑢𝑡𝑝𝑢𝑡:𝑎𝑙𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑑𝑔𝑒𝑠 𝑎𝑟𝑟𝑎𝑦 𝑬

1. 𝑆𝑜𝑟𝑡 𝑃 𝑏𝑦 𝑦 − 𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒

2. 𝒇𝒐𝒓 𝑖 = 𝑛 − 2 𝑡𝑜 0

3. 𝑔𝑒𝑡 𝑡ℎ𝑒 𝑠𝑢𝑏𝑠𝑒𝑡 𝑃 𝑖 ← { 𝑝𝑖 ,𝑝𝑖+1,⋯ ,𝑝𝑛−1}

4. 𝐿 ← 𝑆𝑜𝑟𝑡 𝑃 𝑖 𝑏𝑦 𝑥 − 𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒

5. 𝒇𝒐𝒓 𝑘 = 𝑛 − 𝑖 − 1 𝑡𝑜 0

6. 𝑔𝑒𝑡 𝑡ℎ𝑒 𝑠𝑢𝑏𝑠𝑒𝑡 𝐿 𝑘 ← { 𝑙𝑘 , 𝑙𝑘+1 ,⋯ , 𝑙𝑛−𝑖−1}

7. min ← 𝑓𝑖𝑛𝑑𝑀𝑖𝑛𝐸𝑑𝑔𝑒(𝐿(𝑘))

8. 𝑎𝑑𝑑 𝑚𝑖𝑛 𝑡𝑜 𝐸

18

3. Program

The program is developed in C programming language, and tested on Linux operating

system.

3.1 Main function implementation

The main function is to implement the data structure to find the closest pair in the query

region. There are 4 steps to realize the requirement (Fig.13).

STEP 1: Assume the plane is a square with an integer RANGE as maximum x- and y-

coordinate values. User inputs the total point number and the plane range, so the plane is a

RANGE × RANGE square. Then the program generates the points: for each point, “rand()”

function gets two integers, which are divided by RANGE. The remainders are the x- and

y-coordinates of the point; this process ensures all points are in the plane. All points are

stored in the array “*a_h” with type “point”.

STEP 2: Compute the graph. The points in array “a_h” are computed by algorithm in

Chapter 2.3, and then we get all possible edges. All these edges are stored in an array

named “a_e” with type “edge”. In this step, we convert the edges to points, described in

chapter 2.1.

STEP 3: Based on the edges, the 2DRangeSearchTree can be built. The tree is stored as an

array called “tree_a” with type “treeNode”. Now we have the complete data structure.

STEP 4: Based on the data structure, we use the way which generates the original points

19

(described in STEP 1) to generate the query points, and search for the minimum weight in

the tree, which is equal to get the shortest distance pair in the query region. The answers

will be output. Because the query points may be out of the original points range, the data

structure may not find the answer; therefore, the answers may be “out of range”.

Figure 13 Main process of the program

For programming, there are three types of data designed: point, edge, and treeNode:

20

Figure 14 Data structures used in the program

point

Points are used to present the original point. There are two integer variables x and y that

represent x-coordinate and y-coordinate respectively.

edge

Edges are used to present the relation between two points. The variables x1, y1, x2 and y2

represent the two ends of the edge; x3=min{ x1, x2}, y3=min{y1, y2}, which is the point

after the edge converted(algorithm by chapter 2.1); and the variable weight shows the

distance between the two ends, which is also the length of the edge. Actually, x1, y1, x2, y2

are not necessary for the data structure, they are used for checking whether the algorithm is

21

correct or not.

treeNode

This data structure is used in the range search tree. The variables x, y store the converted

edge’s location; minWeight is the minimum weight of the node’s subtree. If the node is a

leaf, minWeight is its own weight; the variable “leaf” is used as a Boolean. When it is 1, the

node is a leaf in the tree; when it is 0, the node is an internal node. “Left”, “right”, and

“parent” indicate the indices of the node’s left-child, right-child and parent. If the node is a

leaf, it doesn’t have children. If the node is the root of the tree, it doesn’t have a parent; in

this situation, we set its parent to -1. The pointer “*y_t” points to the corresponding Y-tree.

When the “treeNode” structure is used for a Y-tree node, we won’t assign this value. And

the variable “root” and “size” store its y-tree’s root index and size. They are designed for

searching conveniently.

3.2 Important functions

Merge sort[4]

Merge-sort is used in this program in two times, one is to sort the original points, and the

other is to sort the edges. It’s an O(n log n) sorting algorithm.

MergeSort( L, s)

Input: an array L and its size s

Output: the sorted array L

if s ≤ 1

return L;

m = s/2

22

left ← L 0, m ;

right ← L[ m + 1, s − 1];

left =MergeSort(left);

right = MergeSort(right);

L = Merge(left, right);

return L;

Merge(left, right)

Input: two array left and right

Output: an sorted array result contains left ∪ right

Array result;

While size(left)>0 & size(right)>0

If left(first) < right(first)

Append left(first) to result

Left ← left(rest)

else

Append right(first) to result

right ← right(rest)

if size(left) >0

append left to result

else append right to result

return result

Based on this algorithm, we can sort the array by x-coordinate or y-coordinate. When

sorting by the x-coordinate, if the two elements have the same x coordinates, compare the

y-coordinates. When sorting by y-coordinate, vice versa.

Build binary search tree

This function transfers the edge array to the treeNode array. Since the internal node’s

children may be a leaf or an internal node, if we use pointer structure, we should define two

kinds of data types. Therefore, we use an array to store all tree nodes, and use the index to

23

indicate each one.

For example, there is a range search tree in Fig.15(a). if we move down the internal nodes

vertically to the leaves level.(Fig 15(b)) then we get an array as shown in the red rectangle.

If the edges number is n, the tree array size is 2n-1, since the last edge is not copied twice.

So we can copy the first n-1 elements of the edges array twice to the tree array, and the last

edge once. Because each node in the tree should be assigned a value, each node takes O(1)

time to get value, the running time of build the tree array is O(n).

Figure 15 Convert the tree to an array

24

Find the root of the tree

This function finds the internal nodes of the tree, and return index of the root node. We

divide the array by two equal parts, and the middle node is the root of the tree. Then run the

function recursively to find each subtree’s root.

Figure 16 Function: treeRoot()

Figure 17 Relations between the indexes and the tree nodes

For example (Fig.17), there is a tree with size 15 stored in the array. The root of the tree

is the node with index 7 =0+14

2. If we look into the internal node “2”, which has index 3,

its subtree is from the index 0 to 6. Same as the internal node “6”, its subtree contains the

node from index 8 to 14. That means the root’s index of the subtree or tree is the middle

25

of its subtree’ nodes’. Therefore, we define the smallest index in a subtree as variable

“left” and the largest index as “right”, the root’s index of this subtree as “mid”,

mid = left +right

2.

When a tree’s root is found, the tree is divided into two subtrees. We can use the same

way to find their root, the left subtree’s root is the root’s left child, and the right subtree’s

root is the root’s right child. When a subtree is size 1, it cannot be divided any more, that

means the node is a leaf in the tree.(Fig18)

Figure 18 Find the left and right children of each internal node

Fig.18 shows how the function works. The little blue squares are the roots of its

subtrees(trees). The root is the middle of its subtree nodes indexes, once it’s found, the

tree is divided into two subtree, and find their root then return to the original root as left

child and right child. When the left pointer and right pointer are pointing to the same

node, the node is a leaf and returns its index to its parent.

26

Find the minimum weight of the subtree

In the range search tree data structure, each node has a minWeight value which indicates

the minimum weight of its subtree. When the node is a leaf, its subtree is only itself; the

minWeight is its original weight. If the node is an internal node, which means it has 2

children, the minWeight is the smaller weight of its children’s. Therefore, we can find the

minWeight of each node recursively.

In this function, input the tree array and the root index. If the root node is a leaf, return its

weight; else check its two children’s minWeight, and return the smaller one. Actually, the

function is running from the leaves level to the root. Each leaf report its weight to its parent,

parent choose a smaller weight in its two children’s as its minWeight, then report to its

parent, and so on until the root gets the minWeight of the whole tree.

Figure 19 Function: min_weight()

27

4. Running time

The running time result is test on the computer with processor Intel Atom N450 1.66GHz

and 1GB RAM. The system is operated by Ubuntu Linux with the kernel version of

2.6.19.

4.1 Convert the graph

To test the time of convert the graph is from the original points set to get all the possible

query edges. It contains 3 steps. Assuming there are n points in the plane, the running time

of each step is shown as follow:

1. Generate points

To generate the points, the elements in the point array should be assigned x and y

values. For each value, first we generate a random integer number, and then divide

by the plane’s RANGE equal to the maximum coordinate in the plane; the remainder

is the value. All these steps run in constant time, therefore each point need O 1

running time, and the total running time is O n .

2. Merge sort the points by y-coordinate

As discussed in Chapter 3.2, merge sort running time isO(n log n).

3. Compute the graph

Based on the algorithm in chapter 2.3, the pseudo code of compute the graph is:

28

COMPUTEGRAPH(P)

𝐼𝑛𝑝𝑢𝑡: 𝑎 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑃

𝑂𝑢𝑡𝑝𝑢𝑡:𝑎𝑙𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑑𝑔𝑒𝑠 𝑎𝑟𝑟𝑎𝑦 𝐸

1. 𝑆𝑜𝑟𝑡 𝑃 𝑏𝑦 𝑦 − 𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒 O(n log n)

2. 𝑓𝑜𝑟 𝑖 = 𝑛 − 2 𝑡𝑜 0 O(n)

3. 𝑔𝑒𝑡 𝑡𝑕𝑒 𝑠𝑢𝑏𝑠𝑒𝑡 𝑃 𝑖 ← { 𝑝𝑖 ,𝑝𝑖+1 ,⋯ ,𝑝𝑛−1} O(n − i + 1)

4. 𝐿 ← 𝑆𝑜𝑟𝑡 𝑃 𝑖 𝑏𝑦 𝑥 − 𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒 O( n − i log(n − i))

5. 𝑓𝑜𝑟 𝑘 = 𝑛 − 𝑖 − 1 𝑡𝑜 0 O(n − i)

6. 𝑔𝑒𝑡 𝑡𝑕𝑒 𝑠𝑢𝑏𝑠𝑒𝑡 𝐿 𝑘 ← { 𝑙𝑘 , 𝑙𝑘+1 ,⋯ , 𝑙𝑛−𝑖−1} O(n − k − i)

7. min ← 𝑓𝑖𝑛𝑑𝑀𝑖𝑛𝐸𝑑𝑔𝑒(𝐿(𝑘)) O(n − k − i)

8. 𝑎𝑑𝑑 𝑚𝑖𝑛 𝑡𝑜 𝐸 O(n)

running time (ComputeGraph) = O(n log n) + O(n) × {O n − i + 1 + O((n − i) log(n − i))

+O n − i × [O n − k − i + O n − k − i + O n ]}

= O(n log n) + O(n) × [O n log n) + O n2

= O(n log n) + O n2 log n + O n3

= O(n3)

Therefore, the total running time of converting the graph is the sum of every step’s running

time, equaling:

O n + O(n log n) + O n3 = O(n3)

Run the program with the special points numbers, each number is tested 10 times, and get

the average running time. The result is shown as Tab.1

Points Number Running time(second)

100 0.000000

200 0.029000

500 0.157000

1000 0.620000

2000 2.708000

29

5000 18.685000

8000 58.112000

Table 1 Running time of convert the graph

Figure 20 Running time of converting the graph

4.2 Build the range search tree

As mentioned previously, to build the 2D binary tree has 4 steps, assume we have n edges

to build the tree, the running time of the algorithms is:

1. Generate the edges and sort it by x-coordinate use merge sort: O n log n

2. Convert the edges to tree nodes and build the tree array: O 2n = O(n)

3. Find the tree root, and assign the minimum weight for each node in x-tree: O(n log n)

4. Build Y-tree for each node in x-tree

0

10

20

30

40

50

60

70

1002005001000 2000 5000 8000

convert graph

30

Figure 21 Running time of build y-trees

In Fig.21, the triangle represents the x-tree with size n. The first level(root) node builds

y-tree; the running time to build a size n y-tree’s O(n log n). The second level has 2

nodes, each of them has subtree with size n

2; the total time to build the second level

y-trees is 2 × O n

2log

n

2 = O n log

n

2 ≤ O n log n . The third level has 4 nodes, their

subtrees’ size is n

4; the total time to build the third level y-trees is 4 × O

n

4log

n

4 =

O(n logn

4) ≤ O(n log n). For ith level, the number of nodes is 2i−1, each subtree has

size n

2i−1, the total time to build the ith level’s y-trees is

#nodes × time of one node y − tree, which is

2i−1 × O n

2i−1log

n

2i−1 = O n log

n

2i−1 ≤ O n log n . Therefore, each level’s running

time is O n log n , and there are log n levels, so the total running time to build the

corresponding y-trees is O n log n × log n = O(n log2n)

31

Edge Number Running time(second)

1000 0.015000

5000 0.083000

10000 0.166000

50000 0.953000

100000 2.073000

500000 14.562000

1000000 37.652000

Table 2 Running time of building tree

Figure 22 Running time of building tree

4.3 Build the data structure

Based on the time complexity above, the total time of building the data structure starts from

the original points and ends up with the tree built complete. Chapter 4.2 shows that to build

a tree may take a long time when the number of edges is large. Actually, the number of

possible edges is not as many when computing an original plane. Usually the number of

0

5

10

15

20

25

30

35

40

100050001000050000100000 500000 1000000

buildTree

buildTree

32

edges is less than 100, when the original point number reaches the upper bound. Therefore,

the running time of build the tree is too small, and the running time to build the data

structure is close to the time of computing the graph. For each point number, we run the

code 10 times, get the edge numbers and running times, and calculate the average values.

Number of points Number of edges Running time(s)

100 23.4 0.004

200 29.1 0.025

500 35.4 0.151

1000 46.5 0.629

2000 51.5 2.663

5000 61.7 19.340

8000 63.0 57.948

Table 3 Running time of building the data structure

Figure 23 Running time of building the data structure

From Tab 3, as the point number increases, the edge number increases slowly, and the

running time is close to the values in Tab1.

0

10

20

30

40

50

60

70

1002005001000 2000 5000 8000

build the data structure

build the data structure

33

4.4 Query time

Based on chapter 2.2 algorithm

1DRANGESEARCH (𝑻,𝒖)

𝐼𝑛𝑝𝑢𝑡: 𝑎 𝑡𝑟𝑒𝑒 𝑻, 𝑎 𝑞𝑢𝑒𝑟𝑦 𝒖

𝑂𝑢𝑡𝑝𝑢𝑡: 𝑡h𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑤𝑒𝑖𝑔h𝑡 𝑖𝑛 𝑡h𝑒 𝑞𝑢𝑒𝑟𝑦 𝑟𝑒𝑔𝑖𝑜𝑛

1. 𝑣 ← 𝐹𝐼𝑁𝐷𝑆𝑃𝐿𝐼𝑇𝑁𝑁𝑂𝐷𝐸 𝑇,𝑢 𝑂(log𝑛)


3. 𝒕𝒉𝒆𝒏 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣𝑤𝑒𝑖𝑔ℎ𝑡 𝑂(1)

4. 𝒆𝒍𝒔𝒆(∗ 𝐹𝑜𝑙𝑙𝑜𝑤 𝑡𝑕𝑒 𝑝𝑎𝑡𝑕 𝑡𝑜 𝑢 ∗)

5. 𝒘𝒉𝒊𝒍𝒆 𝑣 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑙𝑒𝑎𝑓 𝑂(log𝑛)

6. 𝒅𝒐 𝒊𝒇 𝑣 ≥ 𝑢

7. 𝒕𝒉𝒆𝒏 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣 ′𝑠 𝑟𝑖𝑔𝑕𝑡𝐶𝑕𝑖𝑙𝑑𝑤𝑒𝑖𝑔ℎ𝑡 𝑂(1)

8. 𝑣 ← 𝑙𝑒𝑓𝑡 𝑐𝑕𝑖𝑙𝑑 𝑣 𝑂(1)

9. 𝒆𝒍𝒔𝒆 𝑣 ← 𝑟𝑖𝑔𝑕𝑡 𝑐𝑕𝑖𝑙𝑑 𝑣 𝑂(1)

10. 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣𝑤𝑒𝑖𝑔 𝑕𝑡 𝑂(1)

11. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑤𝑒𝑖𝑔h𝑡𝑠 O(1)

Therefore, the running time of 1DRangeSearch is O log n .

2DRangeSearch 𝑻,𝒒

𝐼𝑛𝑝𝑢𝑡: 𝑎 2𝐷 𝑡𝑟𝑒𝑒 𝑻,𝑎 𝑞𝑢𝑒𝑟𝑦 𝑝𝑜𝑖𝑛𝑡 𝒒

𝑂𝑢𝑡𝑝𝑢𝑡: 𝑡𝑕𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑤𝑒𝑖𝑔𝑕𝑡 𝑖𝑛 𝑡𝑕𝑒 𝑞𝑢𝑒𝑟𝑦 𝑟𝑒𝑔𝑖𝑜𝑛

1. 𝑣 ← 𝐹𝐼𝑁𝐷𝑆𝑃𝐿𝐼𝑇𝑁𝑂𝐷𝐸 𝑇, 𝑞𝑥 𝑂(log𝑛)


3. 𝒕𝒉𝒆𝒏 𝒊𝒇 𝑣𝑦 ≥ 𝑞𝑦 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣weight 𝑂(1)

4. 𝒆𝒍𝒔𝒆 (∗ 𝐹𝑜𝑙𝑙𝑜𝑤 𝑡𝑕𝑒 𝑝𝑎𝑡𝑕 𝑡𝑜 𝑞𝑥 ∗)

5. 𝒘𝒉𝒊𝒍𝒆 𝑣 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑙𝑒𝑎𝑓 𝑂(log𝑛)

6. 𝒅𝒐 𝒊𝒇 𝑣𝑥 ≥ 𝑞𝑥

7. 𝒕𝒉𝒆𝒏 𝑤 ← 𝑟𝑖𝑔𝑕𝑡𝐶𝑕𝑖𝑙𝑑 𝑣 𝑂(1)

8. 1𝐷𝑅𝑎𝑛𝑔𝑒𝑆𝑒𝑎𝑟𝑐𝑕 𝑤𝑦−𝑡𝑟𝑒𝑒 , 𝑞𝑦 𝑂(log𝑛)

9. 𝑣 ← 𝑙𝑒𝑓𝑡𝐶𝑕𝑖𝑙𝑑 𝑣 𝑂(1)

10. 𝒆𝒍𝒔𝒆 𝑣 ← 𝑟𝑖𝑔𝑕𝑡𝐶𝑕𝑖𝑙𝑑 𝑣 𝑂(1)

11. 𝒊𝒇 𝑣𝑦 ≥ 𝑞𝑦 𝑟𝑒𝑝𝑜𝑟𝑡 𝑣𝑤𝑒𝑖𝑔 𝑕𝑡 𝑂(1)

12. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑤𝑒𝑖𝑔h𝑡𝑠 𝑂(1)

Therefore, the running time of 2DRangeSearch is O( log n 2).

34

Build the 2D range search trees with edge numbers 1000, 5000, 10000, 50000, 100000,

500000, 1000000. For each test, generate 1000 query points to get the answer time. Every

edge number should be tested 10 times, and get the average answer time.

Edge Number Running time of 1000 query points(second)

1000 0.009

5000 0.013

10000 0.014

50000 0.027

100000 0.029

500000 0.292

1000000 1.022

Table 4 Running time of 1000 query points

Figure 24 Running time of 1000 query points

0

0.2

0.4

0.6

0.8

1

1.2

100050001000050000100000 500000 1000000

query time

query time

35

5. Conclusion

This project implements the algorithms to find the closest pair in the query region.

In this project, only a lower bound was used in searching the binary search tree, but

actually, the algorithms can be used for searching in a range that has both lower and

upper bounds. In that way, the query region may be formed by two query points, and the

two points are the two ends of the diagonal line of the rectangle query region.(fig.25)

Figure 25 2 query points region searching

Based on the running time analysis, to build the data structure takes a long time, but the

running time of query searching in the data structure is very fast. Basically, the actual

running time curves are close to the theoretical value, the differences are caused by the

memory access and the hardware access, thread management and some machine

operation. In this project, we generate the points randomly, and every searching time is

tested based on different points set, so the total process appears inefficient. Actually, it

36

may be applied for some specially given graph (such as a data base or a map). Therefore,

although it costs a long time to build the data structure for a large number, the searching

process won’t be as long.

37

6. References

[1] Closest pair of points problem [online]

http://en.wikipedia.org/wiki/Closest_pair_of_points_problem

[2] range tree [online] http://en.wikipedia.org/wiki/Range_tree

[3] M. de Berg, O. Cheong, M. van Kreveld, M. Overmars. “Orthogonal Range Searching

querying in a database”, Computational Geometry: Algorithms and Applications.

Springer-Verlag Berlin Heidelberg, 3nd edition, 2008, pp 96-108.

[4] merge-sort [online] http://en.wikipedia.org/wiki/Merge_sort

http://en.wikipedia.org/wiki/Closest_pair_of_points_problem

finding the closest pair in the query region › sites › default › files › ...finding the...

Documents