1 csis 7101: csis 7101: spatial data (part 2) efficient processing of spatial joins using r-trees...

Post on 18-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

CSIS 7101:CSIS 7101:Spatial Data (Part 2)

Efficient Processing of Spatial Joins Using R-trees

Rollo ChanChu Chung Man

Mak Wai YipVivian Lee

Eric LoSindy ShouHugh Wang

2

Efficient Processing of Spatial Join Using R-trees

What is Spatial Data? Consists of points, lines, rectangles,

polygons, surfaces…

Two types of queries in DBS Single scan and Multiple scan queries

How to retrieve spatial objects in GIS efficiently? Spatial Access Method (SAM) – eg. R*-tree

3

Designed to support single scan query eg. Window query “Find all objects which intersect a given

window”

Attempts to store objects which are close together in the data space on a common page Reduces number of disk accesses

What is Spatial Access Method?

4

How is window query processed by SAM?

1) Filter step Find all objects whose minimum bounding

rectangles intersects the query rectangle

2) Refinement step Check whether the objects fulfill the query

condition

5

To combine two sets of spatial objects according to some spatial properties

It is an important type of query for multiple scanning in spatial DBS

What is Spatial Join?

6

Example of Spatial Join Two relations: forests, cities

(Assume an attributes in each relation represents the borders of forests and cities)

Example query would be: “Find all forests which are in a city”

7

Problems when performing Spatial Join

It is too expensive in terms of CPU time and I/O time

Traditional index structure is not efficient for spatial join

How to make it more efficient? R*-tree

8

Why using R*-tree for Spatial Join ?

To optimize CPU-time and I/O time

Less comparison than a simple nested loop

Other algorithms cannot be efficiently applied to spatial join

9

R*-tree Approach for Spatial Join

Suppose there are two R*-trees R, S

Idea:

To use the property that directory rectangles

form the minimum bounding box of data

rectangles in the corresponding subtrees.

If the rectangles of two directory entries ER

and ES have common intersection then there

is a pair (rectR, rectS)

10

Minimum Bounding Box

11

Is there anyway to be more efficient? There are two areas we need to take

into account in order to be more efficient

CPU – Time Tuning

I/O – Time Tuning

12

CPU – Time Tuning Two ways to improve CPU – time

Restricting the search space

Spatial sorting and plane sweep

13

Restricting the search space Idea:

Scan through each of two nodes marks all

entries which are required for performing

the join, (i.e. which intersect the intersecting

rectangles of two nodes. )

Then, each marked entry of one node is

tested against all marked entries of the

other node.

14

Restricting the search space (cont’d)

1

4

3

2

5

6

7

1

2

3

46

5

7

Original: 7 of R * 7 of S

1

21

2

3 Now: 3 of R * 2 of S

= 49 joins

Plus Scanning: 7 of R + 7 of S

=6 joins

= 14 times

15

Spatial sorting and plane sweep Idea:

Sort the entries in a node of the R*-tree

according to the spatial location of the

corresponding rectangles.

Then move the Sweep-Line perpendicular to

one of the axes from left to right to compute

the intersections.

16

Example of Sorted Intersection Test

t = r1 : r1 <--> s1

t = s1 : s1 <--> r2

t = r2 : r2 <--> s2, r2 <--> s3

t = s2 : - t = r3: r3 <--> s3

Sweep-Line

r1.xu

s1.xl

s1.xl < r1.xu

17

I/O Time Tuning To achieve good I/O-performance with a buffer

size as small as possible R*-tree might occupy only small portion of LRU-

buffer

Compute a read schedule of the pages to minimize the number of disk accesses Local optimization policy based on spatial locality

Idea of Read Schedule: If a frequently used page always resides in the buffer, the number of disk access can be improved by a lot

18

Three such techniques Local plane sweep

Local plane sweep with pinning

Local z-order

19

Local Plane-Sweep Order Idea:

Based on spatial ordering, the plane-sweep

algorithm creates a sequence of pairs of

intersecting rectangles.

This sequence can be used to determine the

read schedule of the spatial join.

20

Local Plane-Sweep Order (cont’d)

Read schedule:

s1

r1

r2

s2

r3

r4

1 2

3

4

5

6

<

s1

s2

r2

r1

r4

r3

>, , , , ,

21

Local Plane-Sweep Order w/ Pinning

Idea:1. Determine a pair of (Er,Es) of entries wrt local

plane sweep order. Compute the degree of the rectangles of both entries Deg(E.rect) = # of intersections between E.rect

and the rectangles which belong to entries of the other tree that are not yet processed

2. Pin the page in the buffer whose corresponding rectangle has maximal degree

3. Perform spatial join on the pinned page with all other pages

22

Local Plane-Sweep Order w/ Pinning (cont’d)

s1

r1

r2

s2

r3

r4

Er

EsEr.rect = r1Es.rect = s2

Deg(r1) =Deg(s2) =

02

1

2

23

Local Z-Order Idea:

1. Compute the intersections between each rectangle of the one node and all rectangles of the other node

2. Sort the rectangles according to the spatial location of their centers

3. Decompose the underlying space into cells of equal size and provide an ordering on this set of cells

24

Local Z-Order (cont’d)

s1

r1

r2

s2

r3

r4

IV II

I

III

IV

Read schedule:<s1,r2,r1,s2,r4,r3>

II

I

III

25

Number of Disk Access

0

1000

2000

3000

4000

5000

6000

7000

LPS order LPS order w/Pinning

Z-order

0KByte8KByte32KByte128KByte512KByte

5384 5290

2373 2392

Size ofLRU Buffer

>

<

26

Number of Disk Access (cont’d)

0

1000

2000

3000

4000

5000

6000

Original LPS order w/ Pinning

0KByte

8KByte

32KByte

128KByte

512KByte

Size ofLRU Buffer

27

Q & A

That’s it for the PresentationAny Questions?

28

Reference1. Brinkhoff T., Kriegel H.P., Seeger B. (1993).

Institute of Computer Science, University of Munich. Efficient Processing of Spatial Joins Using R-trees. Washington, DC, USA: ACM-SIGMOD.

top related