b-trees - carnegie mellon school of computer scienceckingsf/bioinfo-lectures/btrees.pdf · b-trees...

44
B-Trees CMSC 420: Lecture 9

Upload: vokien

Post on 06-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

B-TreesCMSC 420: Lecture 9

Page 2: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

Another way to achieve “balance”

• Height of a perfect binary tree of n nodes is O(log n).

• Idea: Force the tree to be perfect.

- Problem: can’t have an arbitrary # of nodes.

- Perfect binary trees only have 2h-1 nodes

• So: relax the condition that the search tree be binary.

• As we’ll see, this lets you have any number of nodes while keeping the leaves all at the same depth.

Global balance instead of the local balance of AVL trees.

Page 3: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Trees

• All leaves are at the same level.

• Each internal node has either 2 or 3 children.

• If it has:

- 2 children => it has 1 key

- 3 children => it has 2 keys

67 75

keys < 67

keys 68...74

keys > 75

24

keys < 24

keys > 24

2-node: 3-node:

Page 4: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Find (multiway searching)

50

11 45 49 55 60

62

7312 19

17 42

25

Find 19

Standard BST-type walk down the tree. At each node have to examine each key stored there.

Page 5: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Find (multiway searching)

50

11 45 49 55 60

62

7312 19

17 42

25

Find 19

Standard BST-type walk down the tree. At each node have to examine each key stored there.

Page 6: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Find (multiway searching)

50

11 45 49 55 60

62

7312 19

17 42

25Find 19

Standard BST-type walk down the tree. At each node have to examine each key stored there.

Page 7: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Find (multiway searching)

50

11 45 49 55 60

62

7312 19

17 42

25Find 19

Find 62

Standard BST-type walk down the tree. At each node have to examine each key stored there.

Page 8: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Find (multiway searching)

50

11 45 49 55 60

62

7312 19

17 42

25Find 19

Find 62

Standard BST-type walk down the tree. At each node have to examine each key stored there.

Page 9: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76 80

94

9817 25

22 68

Page 10: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76

102

80

94

9817 25

22 68

Page 11: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76

102

80

94

9817 25

22 68

Page 12: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76 10280

94

9817 25

22 68

Page 13: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76

13

10280

94

9817 25

22 68

Page 14: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76

13

10280

94

9817 25

22 68

Page 15: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76 13 102

Overflow!

80

94

9817 25

22 68

Page 16: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76 13 102

Overflow!

Try Key Rotation: Look for left or right sibling with some space, move a parent key into it, and a child key into the parent

80

94

9817 25

22 68

Page 17: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76 13 102

Overflow!

Try Key Rotation: Look for left or right sibling with some space, move a parent key into it, and a child key into the parent

80

94

9817 25

22 68

Page 18: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76 13 102

Overflow!

Try Key Rotation: Look for left or right sibling with some space, move a parent key into it, and a child key into the parent

80

94

9817 2522

68

Page 19: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion

73

11 69 70 76 13 102

Overflow!

Try Key Rotation: Look for left or right sibling with some space, move a parent key into it, and a child key into the parent

80

94

98

17

2522

68

Page 20: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – When key rotation fails

11 69 70 76 13 10280

94

9822 25

17 68

27 73

Page 21: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – When key rotation fails

11 69 70 76 13 10280

94

9822 25

17 6827

73

Page 22: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – When key rotation fails

11 69 70 76 13 10280

94

9822 25

17 68

27

73

Page 23: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – When key rotation fails

11 69 70 76 13 10280

94

9822 25

17 68

27

Overflow!If both siblings are filled, you have to split the node.

73

Page 24: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – When key rotation fails

11 69 70 76 13 10280

94

98

17 68

2722 27

25

Overflow!If both siblings are filled, you have to split the node.

73

Page 25: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – When key rotation fails

11 69 70 76 13 10280

94

9822 27

17 6825Overflow!

73

May have to recursively split nodes, working back to the root.

Page 26: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – When key rotation fails

11 69 70 76 13 10280

94

9822 27

17 68

25

Overflow!

73

May have to recursively split nodes, working back to the root.

Page 27: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – When key rotation fails

11 69 70 76 13 10280

94

9822 27

25

Overflow!17 68

73

May have to recursively split nodes, working back to the root.

Page 28: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Another Splitting Example

11 69 70 76 80

94

9822 27

17 68

7325

10213

85

Page 29: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Another Splitting Example

11 69 70 76 80

94

9822 27

17 68

7325

10213

85

Page 30: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Another Splitting Example

11 69 70 76 80

94

9822 27

17 68

7325

10213 85

Page 31: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Another Splitting Example

11 69 70

94

9822 27

17 68

7325

10213 8576 85

80

Page 32: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Splitting at the root

11 69 7022 27

17 68

7325

98 10213 76

9480

85

87

Page 33: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Splitting at the root

11 69 7022 27

17 68

7325

98 10213 76

9480

85 87

Page 34: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Splitting at the root

11 69 7022 27

17 68

7325

98 10213 76

9480

85 87

105

Page 35: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Splitting at the root

11 69 7022 27

17 68

7325

98 10213 76

9480

85 87 105

Page 36: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Splitting at the root

11 69 7022 27

17 68

7325

13 76

9480

85 87 10598 105

102

Page 37: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Splitting at the root

11 69 7022 27

17 68

7325

13 76 85 87 10598 105

80 102

94

Page 38: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Splitting at the root

11 69 7022 27

17 68

7325

13 76 85 87 98 105

80 102

94

Page 39: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

2,3 Tree Insertion – Splitting at the root

11 69 7022 27

17 68

13 76 85 87 98 105

80 102

73

25 94

Page 40: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

(b+1)/2 -1 keys(b+1)/2 -1 keys

From 2,3-Trees to a,b-trees

• An a,b-tree is a generalization of a 2,3-tree, where each node (except the root) has between a and b children.

• Root can have between 2 and b children.

• We require that:

- a ≥ 2 (can’t allow internal nodes to have 1 child)

- b ≥ 2a - 1 (need enough children to make split work)

b keys1 key

Overflow!(b+1)/2 children ≥ a = a

b+1 children

(b+1)/2 children ≥ a = a

Page 41: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

a,b Insertions & Deletions

A B C D E D EA B

C

C D

B

A A B C D

split

merge

C D E

B

A D EA B

Ckey rotation

Page 42: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

Deletion Details

• Try to borrow a key from a sibling if you have one that has an extra (≥ 1 more than the minimum)

• Otherwise, you have a sibling with exactly the minimum number a-1 of keys.

• Since you are underflowing, you must have one less than the minimum number of keys = a-2.

• Therefore, merging with your sibling produces a node with a-1 + a-2 = 2a-3 keys.

• This is one less than the maximum (2a-2 keys), so we have room to bring down the key that split us from our sibling.

Page 43: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

B-trees

• A B-tree of order b is an a,b-tree with b = 2a-1

- In other words, we choose the largest allowed a.

• Want to have large b if bringing a node into memory is slow (say reading a disc block), but scanning the node once in memory is fast.

• b is usually chosen to match characteristics of the device.

• Ex. B-tree of order 1023 has a = 512.

- If this B-tree stores n = 10 million records, its height no more than O(loga n) ≈ 2.58. So only around 3 blocks need to be read from disk.

Each node (page) is at least 50% full.

Page 44: B-Trees - Carnegie Mellon School of Computer Scienceckingsf/bioinfo-lectures/btrees.pdf · B-trees • A B-tree of order b is an a,b-tree with b = 2a-1-In other words, we choose the

What if b is very large?

• Need to be able to find which subtree to traverse.

• Could linearly search through keys – technically constant time if b is a constant, but may be time consuming.

• Solution: Store a balanced tree (AVL or splay) at each node so that you can search for keys efficiently.