outline - cs.cmu.edu

39
1 15-721: Database Management Systems R-trees C. Faloutsos 15-721 Copyright: C. Faloutsos (2001) 2 Outline • R-trees – main idea; file structure – algorithms: insertion/split – deletion – search: range, nn, spatial joins – performance analysis – variations (packed; hilbert;...) 15-721 Copyright: C. Faloutsos (2001) 3 Problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer spatial queries (range, nn, etc)

Upload: others

Post on 19-Apr-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Outline - cs.cmu.edu

1

15-721:Database Management Systems

R-trees

C. Faloutsos

15-721 Copyright: C. Faloutsos (2001) 2

Outline

• R-trees– main idea; file structure– algorithms: insertion/split– deletion– search: range, nn, spatial joins– performance analysis– variations (packed; hilbert;...)

15-721 Copyright: C. Faloutsos (2001) 3

Problem

• Given a collection of geometric objects(points, lines, polygons, ...)

• organize them on disk, to answer spatialqueries (range, nn, etc)

Page 2: Outline - cs.cmu.edu

2

15-721 Copyright: C. Faloutsos (2001) 4

(Who cares?)

15-721 Copyright: C. Faloutsos (2001) 5

(Who cares?)

• Multi-attribute retrieval• Geographic Information systems (GIS)• Multimedia indexing• (clustering, data mining, visualization)

15-721 Copyright: C. Faloutsos (2001) 6

R-trees

• How to group nearby points/shapestogether?

• Idea: try to extend/merge B-trees and k-dtrees

Page 3: Outline - cs.cmu.edu

3

15-721 Copyright: C. Faloutsos (2001) 7

(first attempt: k-d-B-trees)

• [Robinson, 81]: if f is the fanout, split point-set in f parts; and so on, recursively

15-721 Copyright: C. Faloutsos (2001) 8

(first attempt: k-d-B-trees)

• But: insertions/deletions are tricky (splitsmay propagate downwards and upwards)

• no guarantee on space utilization

15-721 Copyright: C. Faloutsos (2001) 9

R-trees

• [Guttman 84] Main idea: allow parents tooverlap!– => guaranteed 50% utilization

– => easier insertion/split algorithms.

– (only deal with Minimum BoundingRectangles - MBRs)

Page 4: Outline - cs.cmu.edu

4

15-721 Copyright: C. Faloutsos (2001) 10

R-trees

• eg., w/ fanout 4: group nearby rectangles toparent MBRs; each group -> disk page

A

B

C

DE

FG

H

I

J

15-721 Copyright: C. Faloutsos (2001) 11

R-trees

• eg., w/ fanout 4:

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

F GD E

H I JA B C

15-721 Copyright: C. Faloutsos (2001) 12

R-trees

• eg., w/ fanout 4:

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

Page 5: Outline - cs.cmu.edu

5

15-721 Copyright: C. Faloutsos (2001) 13

R-trees - format of nodes

• {(MBR; obj-ptr)} for leaf nodes

P1 P2 P3 P4

A B C

x-low; x-highy-low; y-high

...

objptr ...

15-721 Copyright: C. Faloutsos (2001) 14

R-trees - format of nodes

• {(MBR; node-ptr)} for non-leaf nodes

P1 P2 P3 P4

A B C

x-low; x-highy-low; y-high

...

nodeptr ...

15-721 Copyright: C. Faloutsos (2001) 15

R-trees - range search?

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

Page 6: Outline - cs.cmu.edu

6

15-721 Copyright: C. Faloutsos (2001) 16

R-trees - range search?

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

15-721 Copyright: C. Faloutsos (2001) 17

R-trees - range search

Observations:

• every parent node completely covers its‘children’

• a child MBR may be covered by more thanone parent - it is stored under ONLY ONEof them. (ie., no need for dup. elim.)

15-721 Copyright: C. Faloutsos (2001) 18

R-trees - range search

Observations - cont’d

• a point query may follow multiple branches.

• everything works for any dimensionality

Page 7: Outline - cs.cmu.edu

7

15-721 Copyright: C. Faloutsos (2001) 19

Outline

• R-trees– main idea; file structure– algorithms: insertion/split– deletion– search: range, nn, spatial joins– performance analysis– variations (packed; hilbert;...)

15-721 Copyright: C. Faloutsos (2001) 20

R-trees - insertion

• eg., rectangle ‘X’

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CX

15-721 Copyright: C. Faloutsos (2001) 21

R-trees - insertion

• eg., rectangle ‘X’

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CX

X

Page 8: Outline - cs.cmu.edu

8

15-721 Copyright: C. Faloutsos (2001) 22

R-trees - insertion

• eg., rectangle ‘Y’

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CY

15-721 Copyright: C. Faloutsos (2001) 23

R-trees - insertion

• eg., rectangle ‘Y’: extend suitable parent.

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CY

Y

15-721 Copyright: C. Faloutsos (2001) 24

R-trees - insertion

• eg., rectangle ‘Y’: extend suitable parent.

• Q: how to measure ‘suitability’?

Page 9: Outline - cs.cmu.edu

9

15-721 Copyright: C. Faloutsos (2001) 25

R-trees - insertion

• eg., rectangle ‘Y’: extend suitable parent.

• Q: how to measure ‘suitability’?

• A: by increase in area (volume) (moredetails: later, under ‘performance analysis’)

• Q: what if there is no room? how to split?

15-721 Copyright: C. Faloutsos (2001) 26

R-trees - insertion

• eg., rectangle ‘W’

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

W

K

K

15-721 Copyright: C. Faloutsos (2001) 27

R-trees - insertion

• eg., rectangle ‘W’ - focus on ‘P1’ - how tosplit?

A

B

C

P1

W

K

Page 10: Outline - cs.cmu.edu

10

15-721 Copyright: C. Faloutsos (2001) 28

R-trees - insertion

• eg., rectangle ‘W’ - focus on ‘P1’ - how tosplit?

A

B

C

P1

W

K • (A1: plane sweep,

until 50% of rectangles)

• A2: ‘linear’ split

• A3: quadratic split

• A4: exponential split

15-721 Copyright: C. Faloutsos (2001) 29

R-trees - insertion & split

• pick two rectangles as ‘seeds’;

• assign each rectangle ‘R’ to the ‘closest’‘seed’

seed1

seed2

R

15-721 Copyright: C. Faloutsos (2001) 30

R-trees - insertion & split

• pick two rectangles as ‘seeds’;

• assign each rectangle ‘R’ to the ‘closest’‘seed’

• Q: how to measure ‘closeness’?

Page 11: Outline - cs.cmu.edu

11

15-721 Copyright: C. Faloutsos (2001) 31

R-trees - insertion & split

• pick two rectangles as ‘seeds’;

• assign each rectangle ‘R’ to the ‘closest’‘seed’

• Q: how to measure ‘closeness’?

• A: by increase of area (volume)

15-721 Copyright: C. Faloutsos (2001) 32

R-trees - insertion & split

• pick two rectangles as ‘seeds’;

• assign each rectangle ‘R’ to the ‘closest’‘seed’

seed1

seed2

R

15-721 Copyright: C. Faloutsos (2001) 33

R-trees - insertion & split

• pick two rectangles as ‘seeds’;

• assign each rectangle ‘R’ to the ‘closest’‘seed’

seed1

seed2

R

Page 12: Outline - cs.cmu.edu

12

15-721 Copyright: C. Faloutsos (2001) 34

R-trees - insertion & split

• pick two rectangles as ‘seeds’;

• assign each rectangle ‘R’ to the ‘closest’‘seed’

• smart idea: pre-sort rectangles according todelta of closeness (ie., schedule easiestchoices first!)

15-721 Copyright: C. Faloutsos (2001) 35

R-trees - insertion - pseudocode

• decide which parent to put new rectangleinto (‘closest’ parent)

• if overflow, split to two, using (say,) thequadratic split algorithm– propagate the split upwards, if necessary

• update the MBRs of the affected parents.

15-721 Copyright: C. Faloutsos (2001) 36

R-trees - insertion - observations

• many more split algorithms exist (next!)

Page 13: Outline - cs.cmu.edu

13

15-721 Copyright: C. Faloutsos (2001) 37

Indexing - more detailed outline

• R-trees– main idea; file structure– algorithms: insertion/split– deletion– search: range, nn, spatial joins– performance analysis– variations (packed; hilbert;...)

15-721 Copyright: C. Faloutsos (2001) 38

R-trees - deletion

• delete rectangle

• if underflow– ??

15-721 Copyright: C. Faloutsos (2001) 39

R-trees - deletion

• delete rectangle

• if underflow– temporarily delete all siblings (!);

– delete the parent node and

– re-insert them

Page 14: Outline - cs.cmu.edu

14

15-721 Copyright: C. Faloutsos (2001) 40

R-trees - deletion

• variations: later (eg. Hilbert R-trees w/ 2-to-1 merge)

15-721 Copyright: C. Faloutsos (2001) 41

Indexing - more detailed outline

• R-trees– main idea; file structure– algorithms: insertion/split– deletion– search: range, nn, spatial joins– performance analysis– variations (packed; hilbert;...)

15-721 Copyright: C. Faloutsos (2001) 42

R-trees - range search

pseudocode:

check the root

for each branch,

if its MBR intersects the query rectangle

apply range-search (or print out, if this

is a leaf)

Page 15: Outline - cs.cmu.edu

15

15-721 Copyright: C. Faloutsos (2001) 43

R-trees - nn search

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

15-721 Copyright: C. Faloutsos (2001) 44

R-trees - nn search

• Q: How? (find near neighbor; refine...)

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

15-721 Copyright: C. Faloutsos (2001) 45

R-trees - nn search

• A1: depth-first search; then, range query

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

Page 16: Outline - cs.cmu.edu

16

15-721 Copyright: C. Faloutsos (2001) 46

R-trees - nn search

• A1: depth-first search; then, range query

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

15-721 Copyright: C. Faloutsos (2001) 47

R-trees - nn search

• A1: depth-first search; then, range query

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

15-721 Copyright: C. Faloutsos (2001) 48

R-trees - nn search

• A2: [Roussopoulos+, sigmod95]:– priority queue, with promising MBRs, and their

best and worst-case distance

• main idea:

Page 17: Outline - cs.cmu.edu

17

15-721 Copyright: C. Faloutsos (2001) 49

R-trees - nn search

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

consider only P2 and P4, for illustration

15-721 Copyright: C. Faloutsos (2001) 50

R-trees - nn search

DE

H

J

P2P4q

worst of P2

best of P4=> P4 is useless

for 1-nn

15-721 Copyright: C. Faloutsos (2001) 51

R-trees - nn search

DE

P2q

worst of P2

• what is really the worst of, say, P2?

Page 18: Outline - cs.cmu.edu

18

15-721 Copyright: C. Faloutsos (2001) 52

R-trees - nn search

P2q

• what is really the worst of, say, P2?

• A: the smallest of the two red segments!

15-721 Copyright: C. Faloutsos (2001) 53

R-trees - nn search

• variations: [Hjaltason & Samet] incrementalnn:– build a priority queue

– scan enough of the tree, to make sure you havethe k nn

– to find the (k+1)-th, check the queue, and scansome more of the tree

• ‘optimal’ (but, may need too much memory)

15-721 Copyright: C. Faloutsos (2001) 54

Indexing - more detailed outline

• R-trees– main idea; file structure– algorithms: insertion/split– deletion– search: range, nn, spatial joins– performance analysis– variations (packed; hilbert;...)

Page 19: Outline - cs.cmu.edu

19

15-721 Copyright: C. Faloutsos (2001) 55

R-trees - spatial joins

Spatial joins: find (quickly) allcounties intersecting lakes

15-721 Copyright: C. Faloutsos (2001) 56

R-trees - spatial joins

Assume that they are both organized in R-trees:

15-721 Copyright: C. Faloutsos (2001) 57

R-trees - spatial joins

for each parent P1 of tree T1for each parent P2 of tree T2

if their MBRs intersect,process them recursively (ie., check their

children)

Page 20: Outline - cs.cmu.edu

20

15-721 Copyright: C. Faloutsos (2001) 58

R-trees - spatial joins

Improvements - variations:- [Seeger+, sigmod 92]: do some pre-filtering; do

plane-sweeping to avoid N1 * N2 tests forintersection

- [Lo & Ravishankar, sigmod 94]: ‘seeded’ R-trees(FYI, many more papers on spatial joins, without R-

trees: [Koudas+ Sevcik], e.t.c.)

15-721 Copyright: C. Faloutsos (2001) 59

Indexing - more detailed outline

• R-trees– main idea; file structure– algorithms: insertion/split– deletion– search: range, nn, spatial joins– performance analysis– variations (packed; hilbert;...)

15-721 Copyright: C. Faloutsos (2001) 60

R-trees - performance analysis

• How many disk (=node) accesses we’llneed for– range– nn– spatial joins

• why does it matter?

Page 21: Outline - cs.cmu.edu

21

15-721 Copyright: C. Faloutsos (2001) 61

R-trees - performance analysis

• How many disk (=node) accesses we’llneed for– range– nn– spatial joins

• why does it matter?• A: because we can design split etc

algorithms accordingly; also, do query-optimization

15-721 Copyright: C. Faloutsos (2001) 62

R-trees - performance analysis

• A: because we can design split etcalgorithms accordingly; also, do query-optimization

• motivating question: on, e.g., split, shouldwe try to minimize the area (volume)? theperimeter? the overlap? or a weightedcombination? why?

15-721 Copyright: C. Faloutsos (2001) 63

R-trees - performance analysis

• How many disk accesses for range queries?– query distribution wrt location?– “ “ wrt size?

Page 22: Outline - cs.cmu.edu

22

15-721 Copyright: C. Faloutsos (2001) 64

R-trees - performance analysis

• How many disk accesses for range queries?– query distribution wrt location? uniform; (biased)– “ “ wrt size? uniform

15-721 Copyright: C. Faloutsos (2001) 65

R-trees - performance analysis

• easier case: we know the positions of parentMBRs, eg:

15-721 Copyright: C. Faloutsos (2001) 66

R-trees - performance analysis

• How many times will P1 be retrieved (unif.queries)?

P1

x1

x2

Page 23: Outline - cs.cmu.edu

23

15-721 Copyright: C. Faloutsos (2001) 67

R-trees - performance analysis

• How many times will P1 be retrieved (unif.POINT queries)?

P1

x1

x2

0 10

1

15-721 Copyright: C. Faloutsos (2001) 68

R-trees - performance analysis

• How many times will P1 be retrieved (unif.POINT queries)? A: x1*x2

P1

x1

x2

0 10

1

15-721 Copyright: C. Faloutsos (2001) 69

R-trees - performance analysis

• How many times will P1 be retrieved (unif.queries of size q1xq2)?

P1

x1

x2

0 10

1

q1

q2

Page 24: Outline - cs.cmu.edu

24

15-721 Copyright: C. Faloutsos (2001) 70

R-trees - performance analysis

• How many times will P1 be retrieved (unif.queries of size q1xq2)? A: (x1+q1)*(x2+q2)

P1

x1

x2

0 10

1

q1

q2

15-721 Copyright: C. Faloutsos (2001) 71

R-trees - performance analysis

• Thus, given a tree with N nodes (i=1, ... N) weexpect#DiskAccesses(q1,q2) =

sum ( xi,1 + q1) * (xi,2 + q2)= sum ( xi,1 * xi,2 ) +

q2 * sum ( xi,1 ) +q1* sum ( xi,2 )q1* q2 * N

15-721 Copyright: C. Faloutsos (2001) 72

R-trees - performance analysis

• Thus, given a tree with N nodes (i=1, ... N) weexpect#DiskAccesses(q1,q2) =

sum ( xi,1 + q1) * (xi,2 + q2)= sum ( xi,1 * xi,2 ) +

q2 * sum ( xi,1 ) +q1* sum ( xi,2 )q1* q2 * N

‘volume’

surface area

count

Page 25: Outline - cs.cmu.edu

25

15-721 Copyright: C. Faloutsos (2001) 73

R-trees - performance analysis

Observations:• for point queries: only volume matters• for horizontal-line queries: (q2=0): vertical

length matters• for large queries (q1, q2 >> 0): the count N

matters

15-721 Copyright: C. Faloutsos (2001) 74

R-trees - performance analysis

Observations (cont’ed)• overlap: does not seem to matter• formula: easily extendible to n dimensions• (for even more details: [Pagel +, PODS93],

[Kamel+, CIKM93])

15-721 Copyright: C. Faloutsos (2001) 75

R-trees - performance analysis

Conclusions:• splits should try to minimize area and

perimeter• ie., we want few, small, square-like parent

MBRs• rule of thumb: shoot for queries with q1=q2 =

0.1 (or =0.5 or so).

Page 26: Outline - cs.cmu.edu

26

15-721 Copyright: C. Faloutsos (2001) 76

R-trees - performance analysis

• How many disk (=node) accesses we’llneed for– range– nn– spatial joins

15-721 Copyright: C. Faloutsos (2001) 77

R-trees - performance analysis

Range queries - how many disk accesses, if wejust now that we have

- N points in n-d space?A: ?

15-721 Copyright: C. Faloutsos (2001) 78

R-trees - performance analysis

Range queries - how many disk accesses, if wejust now that we have

- N points in n-d space?A: can not tell! need to know distribution

Page 27: Outline - cs.cmu.edu

27

15-721 Copyright: C. Faloutsos (2001) 79

R-trees - performance analysisWhat are obvious and/or realistic distributions?

15-721 Copyright: C. Faloutsos (2001) 80

R-trees - performance analysisWhat are obvious and/or realistic distributions?A: uniformA: Gaussian / mixture of GaussiansA: self-similar / fractal. Fractal dimension ~

intrinsic dimension

15-721 Copyright: C. Faloutsos (2001) 81

R-trees - performance analysisFormulas for range queries and k-nn queries: use

fractal dimension [Kamel+, PODS94], [Korn+ICDE2000] [Kriegel+, PODS97]

Formulas for spatial joins of regions: openresearch question

Page 28: Outline - cs.cmu.edu

28

15-721 Copyright: C. Faloutsos (2001) 82

Indexing - more detailed outline

• R-trees– main idea; file structure– algorithms: insertion/split– deletion– search: range, nn, spatial joins– performance analysis– variations (packed; hilbert;...)

15-721 Copyright: C. Faloutsos (2001) 83

R-trees - variationsGuttman’s R-trees sparked much follow-up

work• can we do better splits?• what about static datasets (no ins/del/upd)?• what about other bounding shapes?

15-721 Copyright: C. Faloutsos (2001) 84

R-trees - variationsGuttman’s R-trees sparked much follow-up work• can we do better splits?

– i.e, defer splits?

Page 29: Outline - cs.cmu.edu

29

15-721 Copyright: C. Faloutsos (2001) 85

R-trees - variations

A: R*-trees [Kriegel+, SIGMOD90]• defer splits, by forced-reinsert, i.e.: instead

of splitting, temporarily delete some entries,shrink overflowing MBR, and re-insertthose entries

• Which ones to re-insert?• How many?

15-721 Copyright: C. Faloutsos (2001) 86

R-trees - variations

A: R*-trees [Kriegel+, SIGMOD90]• defer splits, by forced-reinsert, i.e.: instead

of splitting, temporarily delete some entries,shrink overflowing MBR, and re-insertthose entries

• Which ones to re-insert?• How many? A: 30%

15-721 Copyright: C. Faloutsos (2001) 87

R-trees - variations

Q: Other ways to defer splits?

Page 30: Outline - cs.cmu.edu

30

15-721 Copyright: C. Faloutsos (2001) 88

R-trees - variations

Q: Other ways to defer splits?A: Push a few keys to the closest sibling node

(closest = ??)

15-721 Copyright: C. Faloutsos (2001) 89

R-trees - variations

R*-trees: Also try to minimize area ANDperimeter, in their split.

Performance: higher space utilization; fasterthan plain R-trees. One of the mostsuccessful R-tree variants.

15-721 Copyright: C. Faloutsos (2001) 90

R-trees - variationsGuttman’s R-trees sparked much follow-up

work• can we do better splits?• what about static datasets (no ins/del/upd)?

– Hilbert R-trees

• what about other bounding shapes?

Page 31: Outline - cs.cmu.edu

31

15-721 Copyright: C. Faloutsos (2001) 91

R-trees - variations• what about static datasets (no ins/del/upd)?• Q: Best way to pack points?

15-721 Copyright: C. Faloutsos (2001) 92

R-trees - variations• what about static datasets (no ins/del/upd)?• Q: Best way to pack points?• A1: plane-sweep

great for queries on ‘x’;terrible for ‘y’

15-721 Copyright: C. Faloutsos (2001) 93

R-trees - variations• what about static datasets (no ins/del/upd)?• Q: Best way to pack points?• A1: plane-sweep

great for queries on ‘x’;bad for ‘y’

Page 32: Outline - cs.cmu.edu

32

15-721 Copyright: C. Faloutsos (2001) 94

R-trees - variations• what about static datasets (no ins/del/upd)?• Q: Best way to pack points?• A1: plane-sweep

great for queries on ‘x’;terrible for ‘y’

• Q: how to improve?

15-721 Copyright: C. Faloutsos (2001) 95

R-trees - variations• A: plane-sweep on HILBERT curve!

15-721 Copyright: C. Faloutsos (2001) 96

R-trees - variations• A: plane-sweep on HILBERT curve!• In fact, it can be made dynamic (how?), as

well as to handle regions (how?)

Page 33: Outline - cs.cmu.edu

33

15-721 Copyright: C. Faloutsos (2001) 97

R-trees - variations• Dynamic (‘Hilbert R-

tree):– each point has an ‘h’-

value (hilbert value)– insertions: like a B-tree

on the h-value– but also store MBR, for

searches

15-721 Copyright: C. Faloutsos (2001) 98

R-trees - variations• Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

15-721 Copyright: C. Faloutsos (2001) 99

R-trees - variations• Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

~B-tree

Page 34: Outline - cs.cmu.edu

34

15-721 Copyright: C. Faloutsos (2001) 100

R-trees - variations• Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

~ R-tree

15-721 Copyright: C. Faloutsos (2001) 101

R-trees - variationsGuttman’s R-trees sparked much follow-up

work• can we do better splits?• what about static datasets (no ins/del/upd)?

– Hilbert R-trees - main idea– handling regions– performance/discusion

• what about other bounding shapes?

15-721 Copyright: C. Faloutsos (2001) 102

R-trees - variations• What if we have regions, instead of points?• I.e., how to impose a linear ordering (‘h-

value’) on rectangles?

Page 35: Outline - cs.cmu.edu

35

15-721 Copyright: C. Faloutsos (2001) 103

R-trees - variations• What if we have regions, instead of points?• I.e., how to impose a linear ordering (‘h-

value’) on rectangles?• A1: h-value of center• A2: h-value of 4-d point

(center, x-radius,y-radius)

• A3: ...

15-721 Copyright: C. Faloutsos (2001) 104

R-trees - variations• What if we have regions, instead of points?• I.e., how to impose a linear ordering (‘h-

value’) on rectangles?• A1: h-value of center• A2: h-value of 4-d point

(center, x-radius,y-radius)

• A3: ...

15-721 Copyright: C. Faloutsos (2001) 105

R-trees - variations• with h-values, we can have deferred splits, 2-

to-3 splits (3-to-4, etc)• experimentally: faster than R*-trees(reference: [Kamel Faloutsos vldb 94])

Page 36: Outline - cs.cmu.edu

36

15-721 Copyright: C. Faloutsos (2001) 106

R-trees - variationsGuttman’s R-trees sparked much follow-up

work• can we do better splits?• what about static datasets (no ins/del/upd)?• what about other bounding shapes?

15-721 Copyright: C. Faloutsos (2001) 107

R-trees - variations• what about other bounding shapes? (and why?)• A1: arbitrary-orientation lines (cell-tree,

[Guenther]• A2: P-trees (polygon trees) (MB polygon: 0,

90, 45, 135 degree lines)

15-721 Copyright: C. Faloutsos (2001) 108

R-trees - variations• A3: L-shapes; holes (hB-tree)• A4: TV-trees [Lin+, VLDB-Journal 1994]• A5: SR-trees [Katayama+, SIGMOD97] (used

in Informedia)

Page 37: Outline - cs.cmu.edu

37

15-721 Copyright: C. Faloutsos (2001) 109

GiST: unifying the variants• ``Generalized Search Tree’’• API:

– consistent(n,q) //returns NO or MAYBE– union(r1, ... rn) // finds, e.g., MBR– penalty(p, n) //cost to put p in n– pickSplit(r1, ... rn) //split set of objects

15-721 Copyright: C. Faloutsos (2001) 110

GiST

• source code at http://gist.cs.berkeley.edu, with– R-trees– R*-trees– etc

15-721 Copyright: C. Faloutsos (2001) 111

Outline

• R-trees– main idea; file structure– algorithms: insertion/split– deletion– search: range, nn, spatial joins– performance analysis– variations (packed; hilbert;...)– Conclusions

Page 38: Outline - cs.cmu.edu

38

15-721 Copyright: C. Faloutsos (2001) 112

R-trees - conclusions• Popular method; like multi-d B-trees• guaranteed utilization• good search times (for low-dim. at least)• Informix ships DataBlade with R-trees

15-721 Copyright: C. Faloutsos (2001) 113

References

• Guttman, A. (June 1984). R-Trees: A Dynamic IndexStructure for Spatial Searching. Proc. ACM SIGMOD,Boston, Mass.

• Joseph M. Hellerstein, Jeffrey F. Naughton, Avi Pfeffer:Generalized Search Trees for Database Systems. VLDB1995: 562-573

• Jagadish, H. V. (May 23-25, 1990). Linear Clustering ofObjects with Multiple Attributes. ACM SIGMOD Conf.,Atlantic City, NJ.

15-721 Copyright: C. Faloutsos (2001) 114

References, cont’d

• Lin, K.-I., H. V. Jagadish, et al. (Oct. 1994). “The TV-tree- An Index Structure for High-dimensional Data.” VLDBJournal 3: 517-542.

• Pagel, B., H. Six, et al. (May 1993). Towards an Analysisof Range Query Performance. Proc. of ACM SIGACT-SIGMOD-SIGART Symposium on Principles of DatabaseSystems (PODS), Washington, D.C.

• Robinson, J. T. (1981). The k-D-B-Tree: A SearchStructure for Large Multidimensional Dynamic Indexes.Proc. ACM SIGMOD.

Page 39: Outline - cs.cmu.edu

39

15-721 Copyright: C. Faloutsos (2001) 115

References, cont’d

• Roussopoulos, N., S. Kelley, et al. (May 1995). NearestNeighbor Queries. Proc. of ACM-SIGMOD, San Jose,CA.