b+ tree & b tree extracted from garcia molina adapted by leu to follow elmasri’s definition

71
B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Upload: oscar-butler

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

B+ tree & B tree

Extracted from Garcia Molina

adapted by Leu to follow Elmasri’s Definition

Page 2: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Root

B+Tree Example n=4

35

110

130

179

11

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 3: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Sample non-leaf

to keys to keys to keys to keys

57 57 < k 81 81 < k 95 >95

57

81

95

Page 4: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Sample leaf node:

From non-leaf node

to next leaf

in sequence57

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Page 5: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

In textbook’s notation n=3

Leaf:

Non-leaf:

30

35

30

30 35

30

Page 6: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Size of nodes: p pointers

p -1 keys (fixed)

Please note that here way or order refer to the maximum number of subtrees

Some definition defines way as the maximum number of keys

Page 7: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Don’t want nodes to be too empty

• Use at least

Non-leaf: p/2 -1 keys (so p/2 tree pointers)

Leaf: p/2 keys & data pointers

Page 8: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Full node min. node

Non-leaf

Leaf

p=4

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Page 9: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to recordsexcept for “sequence pointer”

Page 10: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n n-1 n/2 n/2- 1

Leaf(non-root) n-1 n-1

Root n n-1 1 1

Max Max Min Min ptrs keys ptrs keys

n/2 n/2

Traditional definition

Page 11: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

(3)‘ Number of pointers/keys for B+tree

Non-leaf(non-root) P P-1 P/2 P/2- 1

Leaf(non-root) pleaf pleaf

Root P P-1 1 1

Max Max Min Min ptrs keys ptrsdata keys

(pleaf)/2

Elmasri’s new definition

p- order of the internal node pleaf-order of the leaf node

(pleaf)/2

Page 12: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Tree structure

Page 13: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition
Page 14: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition
Page 15: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Insert into B+tree

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root(e) Consider only maximum number of keys

Page 16: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

When a node is too full

• Node too full (for m way)

K1,K2,…,K 「 m/2 -1 ,K 「 m/2 ,K 「 m/2 +1 ,…,Km

• Split into two node

K1,K2,…,K 「 m/2 -1K 「 m/2

K 「 m/2 +1 ,…,Km

Replace the original node

replicated into parent node

Right child of new key

Page 17: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

(a) Insert key = 7 p=4

3 5 11

30

31

11

31

3 5

7

5

Page 18: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

B + tree with Pleaf

• Splitting point is important

• For a leaf node, the splitting point is

j = (pleaf+ 1)/2

• For a non-leaf node, the splitting point is p/2

• refer to page 178-180 of Elmasri’s book

Page 19: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Example, p =3 and Pleaf = 2

Page 20: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition
Page 21: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

Page 22: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

(b) Coalesce with sibling– Delete 50

10

30

50

10

30

40

50

n=4

40

40

Page 23: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

When to coalesce

• When the sibling has just enough keys

sibling has (pleaf)/2 keys , then

the combined node has (pleaf)/2 + (pleaf)/2 -1 keys, which is less than or

equal to 2 * (pleaf)/2 -1 ≦ pleaf + 1 –1 = pleaf

which is not too big!!!

Page 24: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

(c) Redistribute keys– Delete 50

10

35

50

10

20

30

35

40

50

n=4

35

30 4

0

Page 25: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

40

45

30

37

25

26

20

22

10

141 3

3 14

26

37

(d) Non-leaf coalese– Delete 37

n=4

22

30

22

new root

30

30

Page 26: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Another example

Page 27: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition
Page 28: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

B+tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!

Page 29: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

example

A PARTS file with Pan# as key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose the search field

values are inserted in the given order in a B+-tree of order p=4 and Pleaf=3;

show how the tree will expand and what the final tree looks like.

Page 30: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

solution

Answer:A B+.tree of order p=4 implies that each internal node in the tree (except possibly theroot) should have at least 2 keys (3 pointers) and at most 4 pointers. For Pleaf=3. leafnodes must have at least 2 keys and at most 3 keys. The figure on page 50 shows how thetree progresses as the keys are inserted. We will only show a new tree when insertioncauses a split of one of the leaf nodes. and then show how the split propagates up the tree.Hence, step 1 below shows the tree after Insertion of the first 3 keys 23, 65, and 37,and before Inserting 60 which cause;s overflow and splitting. The trees given below showhow the keys are Inserted In order. Below, we give the keys Inserted for each tree:1:23. 65, 37; 2:60; 3:46; 4:92; 6:48,71; 6:56; 7;59, 18; 8:21; 9:10; 10:74;11:78; 12:15; 13:16; 14:20; 15:24; 16:28.39; 17:43, 47; 18:50, 69: 19:75;20:8, 49, 33. 38;

Page 31: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition
Page 32: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition
Page 33: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

result

Page 34: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Deletion

Suppose the following search field values are deleted in the given order from the

B+.tree of Exercise 5.11, show how the tree will shrink and show the final tree.

The deleted values are: 65, 75, 43, 18, 20, 92, 59, 37.

Page 35: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Solution

An important note about a delete algorithm for a B+-tree is that deletion a Key value from a leaf node will result in a reorganization of the tree If; (i) The leaf node Is less than half full; in this case, we will combine It with the next leaf node (other algorithms combine it with either the next or the previous leaf nodes, or both), (ii) If the key value

deleted is the rightmost (last) value In the leaf node, In which case its value will appear In an Internal node; In this case, the key value to the left of the deleted key in the left

node replaces the deleted key value in the internal node.

Page 36: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Delete 65, 75

Page 37: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Delete 43

Page 38: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Delete 18

Page 39: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Delete 20, 92, 59

Page 40: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Delete 37

Page 41: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Variation on B+tree: B-tree (no +)

• Idea:– Avoid duplicate keys– Have record pointers in non-leaf nodes

Page 42: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

to record to record to record with K1 with K2 with K3

to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k3 >k3

K1 P1 K2 P2 K3 P3

Page 43: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

B-tree example p=3, max. subtrees

65

125

145

165

85

105

25

45

10

20

30

40

110

120

90

100

70

80

170

180

50

60

130

140

150

160

• sequence pointers not useful now! (but keep space for simplicity)

Page 44: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Note on inserts

• Say we insert record with key = 25

10

20

30 p=4

leaf

10

– 20 –

25

30

• Afterwards:

Page 45: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

So, for B-trees:

• Each node has at most p tree pointers• Each node, except the root, has at least p/2 tree

pointers • The root node has at least two tree pointers, unless

it is the only node in the tree• All leaf nodes are at the same level. Leaf node has

the same structure as internal nodes except that all of their tree pointer Pi are null

So at leastp/2 - 1 keys

So at most p –1keys

Page 46: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Insertion Criterion

• Insert at the failure node, by searching the tree

• Insert at the right place, if the node becomes too full, that is, has p keys in it, then split

• To split, take the key at p/2 as the splitting point, take the k p/2 out, and insert it into its parent

• Splitting may propagate to the root

Page 47: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

example

• Build a B-tree of order p =3. The values are inserted in the order 8, 5, 1, 7, 3, 12, 9, 6

Page 48: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

result

Page 49: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

More example

• Try p = 5 with the following key sequence23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15,

16, 20, 24, 28, 39

• Note: large p implies easy solution

Page 50: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Solution (may be wrong!)

28

16,21 46,65

10, 15 18,20 23,24 37,3948,56,59,60

71,74,78,92

Page 51: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Deletion in B-tree

• Deletion may make a node to be less than half full! That is, key no. = p/2 - 2

• Must redistribute keys ( or borrow keys)

• If cannot redistribute keys, perform coalescing

• Coalescing two nodes is ok! Number of keys in the merged node is equal to

p/2 - 2 + p/2 - 1 + 1 = 2 p/2 - 2 p -1≦

Page 52: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Key redistribution

f, x,g

y x

y

Page 53: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Coalesce

…,K i-1, ki, k i+1,…

f g

p/2 - 2 p/2 - 1

Page 54: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Delete sequence 15, 56,10, 74show the resulted tree

28

16,21 46,65

10, 15 18,20 23,24 37,3948,56,59,60

71,74,78,92

Page 55: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

28

21 46,65

10,16,18,20 23,24 37,3948,56,59,60

71,74,78,92

Page 56: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Then delete 56, 10, 74

21, 28,46, 65

10, 16,18,20 23,24 37,3948,56,59,60

71,74,78,92

Page 57: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

When the key is in a internal node

• Key transformation-

replace the key with a proper key in the leaf nodes, then delete the key in the leaf node

Page 58: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Delete key 46

28

16,21 46,65

10, 15 18,20 23,24 37,3948,56,59,60

71,74,78,92

48

Page 59: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Try more ! (try delete key 16)

28

16,21 48,65

10, 15 18,20 23,24 37,3956,59,60

71,74,78,92

Page 60: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Tradeoffs:

B-trees have faster lookup than B+trees

in B-tree, non-leaf & leaf different sizes

B+trees preferred!

Page 61: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

But note:

• If blocks are fixed size(due to disk and buffering restrictions)

Then lookup for B+tree isactually better!!

Page 62: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Example:

- Pointers 4 bytes

- Keys 4 bytes

- Blocks 100 bytes (just an example)

- Look at full 2 level tree

Page 63: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Root has 8 keys + 8 record pointers+ 9 son pointers

= 8x4 + 8x4 + 9x4 = 100 bytes

B-tree:

Each of 9 sons: 12 rec. pointers (+12 keys)

= 12x(4+4) + 4 = 100 bytes

2-level B-tree, Max # records =

12x9 + 8 = 116

Page 64: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Root has 12 keys + 13 son pointers

= 12x4 + 13x4 = 100 bytes

B+tree:

Each of 13 sons: 12 rec. ptrs (+12 keys)

= 12x(4 +4) + 4 = 100 bytes

2-level B+tree, Max # records

= 13x12 = 156

Page 65: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

So...

ooooooooooooo ooooooooo 156 records 108 records

Total = 116

B+ B

8 records

• Conclusion:– For fixed block size,– B+ tree is better because it is bushier

Page 66: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

A more realistic example

EXAMPLE 6: To calculate the order p of a B+ -tree, suppose that the search key field is V= 9 bytes long, the block size is B = 512 bytes, a record pointer is P r = 7 bytes, and a blockpointer is P = 6 bytes, as in Example 4. An internal node of the B+-tree can have up to ptree pointers and p - 1 search field values; these must fit into a single block. Hence, wehave:(p * P) + ((p - 1) * V) B≦(p * 6) + ((p - 1) * 9) 512≦(15 * p) 521≦We can choose p to be the largest value satisfying the above inequality, which gives p= 34. This is larger than the value of 23 for the B-tree, resulting in a larger fan-out andmore entries in each internal node of a B+ -tree than in the corresponding B-tree. The leafnodes of the B+ -tree will have the same number of values and pointers, except that thepointers are data pointers and a next pointer. Hence, the order Pleat for the leaf nodes canbe calculated as follows:(Pleaf * (P r + V)) + P B≦(Pleaf* (7 + 9)) + 6 512≦(16 * Pleat) 506≦It follows that each leaf node can hold up to Pleaf = 31 key value/data pointer combinations, assuming

that the data pointers are record pointers.

Page 67: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

EXAMPLE 7: Suppose that we construct a B+-tree on the field of Example 6. To calculate the approximate number of entries of the B+ -tree, we assume that each node is 69 percent full. On the average, each internal node will have 34 * 0.69 or approximately 23 pointers, and hence 22 values. Each leaf node, on the average, will hold 0.69 * Pleaf = 0.69 * 31 or approximately 21 data record pointers. A B+-tree will have the following average number

of entries at each level:Root: 1 node 22 entries 23 pointersLevel l: 23 nodes 506 entr.ies 529 pointersLevel 2: 529 nodes 11,638 entries 12,167 pointersLeaf level: 12,167 nodes 255,507 record pointers

Page 68: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

EXAMPLE 4: Suppose the search field is V = 9 bytes long, the disk block size is B = 512 bytes, a record (data) pointer is P r = 7 bytes, and a block pointer is P = 6 bytes. Each B-tree node can have at most p tree pointers, p - 1 data pointers, and p - 1 search key field values. These must fit into a single disk block if each B-tree node is to

correspond to a disk block. Hence, we must have: (p * P) + ((p - 1) * (P r + V)) B ≦(p * 6) + ((p - 1) * (7 + 9)) 512≦(22 * p) 528 ≦We can choose p to be a large value that satisfies the above inequality, which

gives p = 23 (p = 24 is not chosen because of the reasons given next).

Page 69: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

EXAMPLE 5: Suppose that the search field of Example 4 is a nonordering key field, andwe construct a B-tree on this field. Assume that each node of the B-tree is 69 percent full.Each node, on the average, will have p * 0.69 = 23 * 0.69 or approximately 16 pointersand, hence, 15 search key field values. The average fan,out fo =16. We can start at theroot and see how many values and pointers can exist, on the average, at each subsequentlevel:Root: 1 node 15 entries 16 pointersLevel l: 16 nodes 240 entries 256 pointersLevel 2: 256 nodes 3840 entries 4096 pointersLevel 3: 4096 nodes 61,440 entriesAt each level, we calculated the number of entries by multiplying the total number ofpointers at the previous level by 15, the average number of entries in each node. Hence,for the given block size, pointer size, and search key field size, a two-level B-tree holds3840 + 240 + 15 = 4095 entries on the average; a three-level B-tree holds 65,535 entrieson the average.

Page 70: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

comparison

• A three-level B+ tree holds up to 255,507 record pointers, on average. Compare to the 65,535 entries for the corresponding B-tree in Example 5.

Page 71: B+ tree & B tree Extracted from Garcia Molina adapted by Leu to follow Elmasri’s Definition

Outline/summary

• Conventional Indexes• Sparse vs. dense

• Primary vs. secondary

• B trees• B+trees vs. B-trees

• B+trees vs. indexed sequential

• Hashing schemes --> Next