yanlei diao umass amherst feb 28,...

18
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 28, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke

Upload: others

Post on 16-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

1

Tree-Structured Indexes

Yanlei DiaoUMass Amherst

Feb 28, 2007

Slides Courtesy of R. Ramakrishnan and J. Gehrke

Page 2: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

2

B+ Tree: Most Widely Used IndexHeight-balanced given arbitrary inserts/deletes.

Fanout: # child pointers of a non-leaf node F = avg. fanout

Height: N = # leaf pagesH = Log F N(Root: level 0, …, Leaf: level H)

Index Entries

Data Entries("Sequence set")

(Direct search)

Page 3: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

3

B+ Tree: Most Widely Used IndexMinimum 50% occupancy (except for the root).

Order of the tree (n): max # of keys in a node.• Can be computed using the node size, key size, pointer size.

Each non-root node contains [ ⎡n/2⎤, n ] entries, i.e., at least half full. Root node can have [1, n] entries.

Index Entries

Data Entries("Sequence set")

(Direct search)

Page 4: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

4

B+ Trees in PracticeTypical order is 200. Typical fill-factor (occupancy)is 67%.

Average fanout = 133Level 0=1 page; Level 1=133 pages; Level 2=1332 pages...

Typical capacities:Height 3: 1333 = 2,352,637 recordsHeight 4: 1334 = 312,900,700 records

Page 5: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

5

Searches in a B+ Tree

Search begins at root, and key comparisons direct it to a leaf.Search for 5*, 15*, all data entries >= 24* ...

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

<13≥13 <17

≥17 <24 ≥24 <30≥30

Page 6: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

6

Inserting a Data Entry into a B+ TreeFind correct leaf L via a top-down search. Put data entry onto L.

If L has enough space, done!Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key k, insert (k, pointer to L2) into parent of L.

Splitting can happen recursively to non-leaf nodes• Redistribute entries evenly, but push up middle key.

(Contrast with leaf splits.)

Splits “grow” the tree! First wider, then one level taller when the root splits.

Page 7: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

7

Previous Example

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

Inserting 8*

Page 8: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

8

Inserting 8* into Example B+ Tree

Minimum occupancy is guaranteed in node splits.

2* 3* 5* 7* 8*

5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

copied up and

Copy up: key value of an inserted entry must appear in a leaf node!

Page 9: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

9

Inserting 8* into Example B+ Tree

Note difference between copy-up and push-up. Reasons?

appears once in the index. Contrast

5 24 30

17

13

Entry to be inserted in parent node.(Note that 17 is pushed up and only

this with a leaf split.)

Push up: Any key value can appear at most once in non-leaf nodes of the tree!

2* 3* 5* 7* 8*

5

13 17 24 30

Page 10: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

10

Example B+ Tree After Inserting 8*

Root was split, leading to increase in height!

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Page 11: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

11

Deleting a Data Entry from a B+ Tree

Start at root, find leaf L where entry belongs.Remove the entry.

If L is at least half-full, done! If L has only ⎡n/2⎤ - 1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling. Must delete index entry (pointing to L or sibling) from parent of L.

Merge could propagate to root, decreasing height.

Page 12: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

12

Current B+ Tree

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Delete 19*

Delete 20*

Page 13: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

13

Example Tree After Deleting 19* and 20* ...

Deleting 19* is easy.Deleting 20* is done with re-distribution. Notice how middle key is copied up.

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Page 14: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

14

New B+ Tree ...

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Delete 24*

×

××

××

Page 15: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

15

... And Then Deleting 24*

Must merge nodes.Toss index entry (right) Pull down of index entry (below).

30

22* 27* 29* 33* 34* 38* 39*

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

Root30135 17

Page 16: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

16

Example of Non-leaf Re-distribution

Tree is shown below during deletion of 24*. (What could be a possible initial tree?)In contrast to previous example, can re-distribute entry from left child of root to right child.

Root

135 17 20

22

30

14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*

Page 17: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

17

After Re-distribution

Intuitively, entries are re-distributed by `pushingthrough’ the splitting entry in the parent node.It suffices to re-distribute index entry with key 20; we’ve re-distributed 17 as well for illustration.

14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2* 3*

Root

135

17

3020 22

Page 18: Yanlei Diao UMass Amherst Feb 28, 2007avid.cs.umass.edu/courses/445/s2008/lectures/445-Lec9...Minimum occupancy is guaranteed in node splits. 2* 3* 5* 7* 8* 5 Entry to be inserted

18

Summary

Tree-structured indexes are ideal for range-searches, also good for equality searches.B+ tree is a dynamic structure.

Inserts/deletes leave tree height-balanced; log F N cost.High fanout (F) means depth rarely more than 3 or 4.Almost always better than maintaining a sorted file.Typically, 67% occupancy on average.Usually preferable to ISAM, modulo lockingconsiderations; adjusts to growth gracefully.If data entries are data records, splits can change rids!