b-trees - carnegie mellon school of computer scienceckingsf/bioinfo-lectures/btrees.pdf · b-trees...
TRANSCRIPT
B-TreesCMSC 420: Lecture 9
Another way to achieve “balance”
• Height of a perfect binary tree of n nodes is O(log n).
• Idea: Force the tree to be perfect.
- Problem: can’t have an arbitrary # of nodes.
- Perfect binary trees only have 2h-1 nodes
• So: relax the condition that the search tree be binary.
• As we’ll see, this lets you have any number of nodes while keeping the leaves all at the same depth.
Global balance instead of the local balance of AVL trees.
2,3 Trees
• All leaves are at the same level.
• Each internal node has either 2 or 3 children.
• If it has:
- 2 children => it has 1 key
- 3 children => it has 2 keys
67 75
keys < 67
keys 68...74
keys > 75
24
keys < 24
keys > 24
2-node: 3-node:
2,3 Tree Find (multiway searching)
50
11 45 49 55 60
62
7312 19
17 42
25
Find 19
Standard BST-type walk down the tree. At each node have to examine each key stored there.
2,3 Tree Find (multiway searching)
50
11 45 49 55 60
62
7312 19
17 42
25
Find 19
Standard BST-type walk down the tree. At each node have to examine each key stored there.
2,3 Tree Find (multiway searching)
50
11 45 49 55 60
62
7312 19
17 42
25Find 19
Standard BST-type walk down the tree. At each node have to examine each key stored there.
2,3 Tree Find (multiway searching)
50
11 45 49 55 60
62
7312 19
17 42
25Find 19
Find 62
Standard BST-type walk down the tree. At each node have to examine each key stored there.
2,3 Tree Find (multiway searching)
50
11 45 49 55 60
62
7312 19
17 42
25Find 19
Find 62
Standard BST-type walk down the tree. At each node have to examine each key stored there.
2,3 Tree Insertion
73
11 69 70 76 80
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76
102
80
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76
102
80
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76 10280
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76
13
10280
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76
13
10280
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76 13 102
Overflow!
80
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76 13 102
Overflow!
Try Key Rotation: Look for left or right sibling with some space, move a parent key into it, and a child key into the parent
80
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76 13 102
Overflow!
Try Key Rotation: Look for left or right sibling with some space, move a parent key into it, and a child key into the parent
80
94
9817 25
22 68
2,3 Tree Insertion
73
11 69 70 76 13 102
Overflow!
Try Key Rotation: Look for left or right sibling with some space, move a parent key into it, and a child key into the parent
80
94
9817 2522
68
2,3 Tree Insertion
73
11 69 70 76 13 102
Overflow!
Try Key Rotation: Look for left or right sibling with some space, move a parent key into it, and a child key into the parent
80
94
98
17
2522
68
2,3 Tree Insertion – When key rotation fails
11 69 70 76 13 10280
94
9822 25
17 68
27 73
2,3 Tree Insertion – When key rotation fails
11 69 70 76 13 10280
94
9822 25
17 6827
73
2,3 Tree Insertion – When key rotation fails
11 69 70 76 13 10280
94
9822 25
17 68
27
73
2,3 Tree Insertion – When key rotation fails
11 69 70 76 13 10280
94
9822 25
17 68
27
Overflow!If both siblings are filled, you have to split the node.
73
2,3 Tree Insertion – When key rotation fails
11 69 70 76 13 10280
94
98
17 68
2722 27
25
Overflow!If both siblings are filled, you have to split the node.
73
2,3 Tree Insertion – When key rotation fails
11 69 70 76 13 10280
94
9822 27
17 6825Overflow!
73
May have to recursively split nodes, working back to the root.
2,3 Tree Insertion – When key rotation fails
11 69 70 76 13 10280
94
9822 27
17 68
25
Overflow!
73
May have to recursively split nodes, working back to the root.
2,3 Tree Insertion – When key rotation fails
11 69 70 76 13 10280
94
9822 27
25
Overflow!17 68
73
May have to recursively split nodes, working back to the root.
2,3 Tree Insertion – Another Splitting Example
11 69 70 76 80
94
9822 27
17 68
7325
10213
85
2,3 Tree Insertion – Another Splitting Example
11 69 70 76 80
94
9822 27
17 68
7325
10213
85
2,3 Tree Insertion – Another Splitting Example
11 69 70 76 80
94
9822 27
17 68
7325
10213 85
2,3 Tree Insertion – Another Splitting Example
11 69 70
94
9822 27
17 68
7325
10213 8576 85
80
2,3 Tree Insertion – Splitting at the root
11 69 7022 27
17 68
7325
98 10213 76
9480
85
87
2,3 Tree Insertion – Splitting at the root
11 69 7022 27
17 68
7325
98 10213 76
9480
85 87
2,3 Tree Insertion – Splitting at the root
11 69 7022 27
17 68
7325
98 10213 76
9480
85 87
105
2,3 Tree Insertion – Splitting at the root
11 69 7022 27
17 68
7325
98 10213 76
9480
85 87 105
2,3 Tree Insertion – Splitting at the root
11 69 7022 27
17 68
7325
13 76
9480
85 87 10598 105
102
2,3 Tree Insertion – Splitting at the root
11 69 7022 27
17 68
7325
13 76 85 87 10598 105
80 102
94
2,3 Tree Insertion – Splitting at the root
11 69 7022 27
17 68
7325
13 76 85 87 98 105
80 102
94
2,3 Tree Insertion – Splitting at the root
11 69 7022 27
17 68
13 76 85 87 98 105
80 102
73
25 94
(b+1)/2 -1 keys(b+1)/2 -1 keys
From 2,3-Trees to a,b-trees
• An a,b-tree is a generalization of a 2,3-tree, where each node (except the root) has between a and b children.
• Root can have between 2 and b children.
• We require that:
- a ≥ 2 (can’t allow internal nodes to have 1 child)
- b ≥ 2a - 1 (need enough children to make split work)
b keys1 key
Overflow!(b+1)/2 children ≥ a = a
b+1 children
(b+1)/2 children ≥ a = a
a,b Insertions & Deletions
A B C D E D EA B
C
C D
B
A A B C D
split
merge
C D E
B
A D EA B
Ckey rotation
Deletion Details
• Try to borrow a key from a sibling if you have one that has an extra (≥ 1 more than the minimum)
• Otherwise, you have a sibling with exactly the minimum number a-1 of keys.
• Since you are underflowing, you must have one less than the minimum number of keys = a-2.
• Therefore, merging with your sibling produces a node with a-1 + a-2 = 2a-3 keys.
• This is one less than the maximum (2a-2 keys), so we have room to bring down the key that split us from our sibling.
B-trees
• A B-tree of order b is an a,b-tree with b = 2a-1
- In other words, we choose the largest allowed a.
• Want to have large b if bringing a node into memory is slow (say reading a disc block), but scanning the node once in memory is fast.
• b is usually chosen to match characteristics of the device.
• Ex. B-tree of order 1023 has a = 512.
- If this B-tree stores n = 10 million records, its height no more than O(loga n) ≈ 2.58. So only around 3 blocks need to be read from disk.
Each node (page) is at least 50% full.
What if b is very large?
• Need to be able to find which subtree to traverse.
• Could linearly search through keys – technically constant time if b is a constant, but may be time consuming.
• Solution: Store a balanced tree (AVL or splay) at each node so that you can search for keys efficiently.