cs 4432lecture #10 - b+ tree indexing1 cs4432: database systems ii lecture #10 professor elke a....
TRANSCRIPT
CS 4432 lecture #10 - b+ tree indexing
1
CS4432: Database Systems IILecture #10
Professor Elke A. Rundensteiner
CS 4432 lecture #8 - indexing 2
Hierarchy of index structuresSequencefield
5030
7020
4080
10100
6090
firstlevel
(dense,if non-
sequential)
10203040
506070...
105090...
highLevel
(alwayssparse)
1
2
5
43
CS 4432 lecture #8 - indexing 3
Conventional indexes : pros/cons ?Advantage:
- Simple- Index is sequential file
good for scans - Search efficient for static data
Disadvantage:
- Inserts expensive, and/or- Lose sequentiality & balance
- Then search time unpredictable
CS 4432 lecture #10 - b+ tree indexing
4
Example Sequential Index
continuous
free space
102030
405060
708090
39313536
323834
33
overflow area(not sequential)
CS 4432 lecture #10 - b+ tree indexing
5
• Without re-organization we get unpredictable performance
• Too much/often re-organization brings too much overhead
• DBA does not know when to reorganize
• DBA does not know how full to loadpages of new index
Problems … Problems … Problems …
CS 4432 lecture #10 - b+ tree indexing
6
So Let’s Try Another Index . . .
• Give up “sequentiality” of index• Predictable performance under
updates• Achieve always balance of “tree” • Automate restructuring under
updates
CS 4432 lecture #10 - b+ tree indexing
7
Root
B+Tree Example n=3
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
CS 4432 lecture #10 - b+ tree indexing
8
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.– average fanout = 133
• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records
• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 Mbytes
CS 4432 lecture #10 - b+ tree indexing
9
Sample non-leaf
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57
81
95
CS 4432 lecture #10 - b+ tree indexing
10
Sample leaf node:
From non-leaf node
to next leafin
sequence5
7
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 9
5
CS 4432 lecture #10 - b+ tree indexing
11
In textbook’s notationn=3
Leaf:
Non-leaf:
30
35
30
30 35
30
CS 4432 lecture #10 - b+ tree indexing
12
Size of node n: n+1 pointersn keys
(fixed)
CS 4432 lecture #10 - b+ tree indexing
14
Full nodemin. node
Non-leaf
Leaf
n=3
12
01
50
18
0
30
3 5 11
30
35
counts
even if
null
Non-leaf: (n+1)/2 pointers
Leaf: (n+1)/2 pointers to data
CS 4432 lecture #10 - b+ tree indexing
15
B+tree rules tree of order n
(1) All leaves at same lowest level(balanced tree)
(2) Pointers in leaves point to records; except for the “sequence pointer”
CS 4432 lecture #10 - b+ tree indexing
17
Root
B+Tree Example : Searches
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
CS 4432 lecture #10 - b+ tree indexing
18
Insert into B+tree
(a) simple case– space available in leaf
(b) leaf overflow(c) non-leaf overflow(d) new root
CS 4432 lecture #10 - b+ tree indexing
19
(a) Insert key = 32 n=33 5 11
30
31
30
100
32
CS 4432 lecture #10 - b+ tree indexing
20
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
7
CS 4432 lecture #10 - b+ tree indexing
21
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
160
18
0
160
179
CS 4432 lecture #10 - b+ tree indexing
22
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
CS 4432 lecture #10 - b+ tree indexing
23
Recap: Insert Data into B+ Tree
• Find correct leaf L. • Put data entry onto L.
– If L has enough space, done!– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.
• This can happen recursively– To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
• Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top.
CS 4432 lecture #10 - b+ tree indexing
24
(a) Simple case (b) Leaf-node: Coalesce with neighbor
(sibling)
(c) Leaf-node: Re-distribute keys(d) Cases (b) or (c) at non-leaf
Deletion from B+tree
CS 4432 lecture #10 - b+ tree indexing
25
(a) Delete key = 11 n=33 5 11
30
31
30
100
CS 4432 lecture #10 - b+ tree indexing
26
(b) Coalesce with sibling– Delete 50
10
40
100
10
20
30
40
50
n=4
40
CS 4432 lecture #10 - b+ tree indexing
27
(c) Redistribute keys– Delete 50
10
40
100
10
20
30
35
40
50
n=4
35
35
CS 4432 lecture #10 - b+ tree indexing
28
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Coalese and Non-leaf coalese– Delete 37
n=4
40
30
25
25
new root
CS 4432 lecture #10 - b+ tree indexing
29
Delete Data from B+ Tree
• Start at root, find leaf L where entry belongs.• Remove the entry.
– If L is at least half-full, done! – If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).
• If re-distribution fails, merge L and sibling.
• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.
• Merge could propagate to root, decreasing height.
CS 4432 lecture #10 - b+ tree indexing
30
• Concurrency control harder in B-Trees• B-tree consumes more space• B-tree automatically decides :
– when to reorganize– how full to load pages of new index
Discussion of B-trees (vs. static indexed sequential files)
CS 4432 lecture #10 - b+ tree indexing
31
ComparisonB-tree vs. indexed seq.
file• Less space, so
lookup faster• Inserts managed
by overflow area• Requires
temporary restructuring
• Unpredictable performance
• Consumes more space, so lookup slower
•Each insert/delete potentially restructures
•Build-in restructuring
• Predictable performance
CS 4432 lecture #10 - b+ tree indexing
32
• Speaking of buffering… Is LRU a good policy for B+tree
buffers?Of course not!
Should try to keep root in memory at all times
(and perhaps some nodes from second level)
Should keep the “path” when going down to leaves
(just in case of restructuring)
CS 4432 lecture #10 - b+ tree indexing
34
Interesting problem:
For B+tree, how large should n be?
…
n is number of keys / node
CS 4432 lecture #10 - b+ tree indexing
35
assumptions: n children per node and N records in database
(1) Time to read B-Tree node from disk is (tseek + tread*n) msec.(2) Once in main memory, use binary search to locate key, (a + b log_2 n) msec(3) Need to search (read) log_n (N) tree nodes
(4) t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)
CS 4432 lecture #10 - b+ tree indexing
36
Can get: f(n) = time to find a record
f(n)
nopt n
FIND nopt by f’(n) = 0
What happens to nopt as:•Disk gets faster? CPU get faster? …
CS 4432 lecture #10 - b+ tree indexing
37
Bulk Loading of B+ Tree
• For large collection of records, create B+ tree.• Method 1: Repeatedly insert records slow.• Method 2: Bulk Loading more efficient.
CS 4432 lecture #10 - b+ tree indexing
38
Bulk Loading of B+ Tree
• Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page.
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Sorted pages of data entries; not yet in B+ treeRoot
CS 4432 lecture #10 - b+ tree indexing
39
Bulk Loading (Contd.)
• Index entries for leaf pages always entered into right-most index page
• When this fills up, it splits.
Split may go up right-most path to root.
3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
Root
Data entry pages
not yet in B+ tree3523126
10 20
3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
6
Root
10
12 23
20
35
38
not yet in B+ treeData entry pages
CS 4432 lecture #10 - b+ tree indexing
40
Summary of Bulk Loading
• Method 1: multiple inserts.– Slow.– Does not give sequential storage of leaves.
• Method 2: Bulk Loading – Has advantages for concurrency control.– Fewer I/Os during build.– Leaves will be stored sequentially (and
linked) – Can control “fill factor” on pages.
SummaryB+ tree idea: self-balancing index structure that supports both search and insert/delete in log_n time.
B+ tree is versatile : handles equality and range searches
B+ tree and its variants: common index structure in industrial DBMSs
CS 4432 lecture #10 - b+ tree indexing
41