more on indexing: b+ trees - 123seminarsonly.com€¦ · b trees general form of ... – b+tree...

39
1 amiri – advanced databases '05 More on indexing: B+ trees

Upload: others

Post on 18-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

1amiri – advanced databases '05

More on indexing:B+ trees

Page 2: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

2amiri – advanced databases '05

Outline

● Motivation: Search example– Cost of searching with and without indices

● B+ trees– Definition and structure

● B+ tree operations– Inserting

– Deleting

Page 3: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

3amiri – advanced databases '05

Dense ordered index on accounts

1230001

CambridgeDublin

AberdeenAberdeenBolton

822000252700081210004

6210005322000312000221230018

account­num

Croydon

EdinburghGlasgow

branch  3,45568,00073,50055,345

25,114  5,21085,30015,772

balance

Disk page

Search key

Aberdeen P0

Pointer

P1

London Pj

Aberdeen

LondonLondon

4250618425040642303091230518

ManchesterManchester

35,314225,210865,3001,135,772

.

.

.

.

.

.

P2

Bolton

Glasgow Pi..

Manchester Pm

..

Index file Data file (accounts table)

Page 4: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

4amiri – advanced databases '05

Index access – performance● Assume

– NR

: number of records in the table

– FR

: data blocking factor, # of data records that fit in a block

– Fi : index blocking factor, # of index entries that fit in a block

● Cost of searching for a data record (in disk I/Os)– Assume a dense primary index (#records=#index entries)

– Binary search on index, then fetch data block

● Cost = 1 + log2(N

R / F

i)

● Assuming index is stored as a sequential file of blocks– Note that this makes updating the index expensive

Page 5: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

5amiri – advanced databases '05

Index access – Example

– NR

: number of records in the table

● 220 (over 1 million records)

– FR

: data blocking factor, # of data records that fit in a block

● 10 (400 byte records, and 4KB pages)

– Fi : index blocking factor, # of index entries that fit in a block

● 256 = 28 (16 byte search-key,pointer pairs in 4KB pages)– Cost of search (without an index)

● NR / F

R = 220 / 10 = 104858 da (disk accesses, ~ 18 minutes)

– Cost of search (using binary search of ordered index)

● 1 + log2(N

R / F

i) = 1 + log

2(220/28) = 13 da (130 msec)

(Calculations assume 10msec per disk access) 

Page 6: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

6amiri – advanced databases '05

Tree-structured indices

● Each tree node is stored in a single disk block● Each node packs in a large number of key,pointer pairs

– Number of children of a node is called fan-out (m)

– Trees Reduce search cost significantly

● Best case search cost: 1 + logm(N

R / F

i )

...

... ...

Page 7: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

7amiri – advanced databases '05

Tree Index access – performance● Example revisited

– NR

: number of records in the table

● 220 (over 1 million records)

– FR

: record blocking factor, # of data records that fit in a block

● 10 (400 byte records, and 4KB pages)

– Fi : index blocking factor, # of index entries that fit in a block

● 256 = 28 (16 byte key,pointer pairs in 4KB pages)

– Assume all nodes in the tree are half-full (fan out m = Fi /2 = 128)

● 1 + logm(N

R / F

i)

● 1 + log128

(220/28) = 1 + log128

(212) = 3 da (30msec)

Page 8: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

8amiri – advanced databases '05

Not all trees are good trees

...

● Trees should be short– Nodes should be relatively full

– Tree should be balanced

Page 9: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

9amiri – advanced databases '05

B trees

● General form of multi-level index● Generalise binary search trees● Balanced tree

– All leaves are at same depth

● Efficient insert and delete– At the expense of some space overhead

Page 10: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

10amiri – advanced databases '05

B+ tree

● Popular variant of the B tree – B tree

● Data pointers may be stored in internal nodes● Every value of the search field appears once

– at some level in the tree– B+ tree

● Data pointers stored only in leaf nodes● Internal nodes contain only keys and tree pointers

– Can pack more pointers in internal nodes– Improved search time due to fewer levels in the tree– No wasted space due to “null” tree pointers in leaf nodes

Page 11: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

11amiri – advanced databases '05

B+ tree

● Internal node of order p=4, leaf nodes of order 3– i.e. Internal/leaf nodes must have between 2 and 4 pointers

● Leaf nodes are chained using sequence pointers

k<100 k>=100

120 150 18030

100

3 5 11 30 35 100 101 110 120 130 150 156 179 180 200

Data pointers

Page 12: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

12amiri – advanced databases '05

B+ tree – internal nodes

● Each internal node of B tree is of the form– <P

1, K

1, P

2, K

2,...,P

q-1,K

q-1, P

q>

● Where q <=p : The B+ tree is said to be of order p● Order of internal node Order of leaf node

– Pi is a tree pointer, points to another node in the B+ tree

● Within each internal node, K1 < K

2 < .. < K

q-1

– For all key field values X in the subtree pointed to by Pi

● Ki-1

<= X < Ki for 1<i<q

● X < Ki for i=1

● Ki-1

<= X for i=q

Page 13: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

13amiri – advanced databases '05

B+ tree – internal nodes (continued)

● Each internal node has at most p pointers● Each internal node has at least p/2 pointers

– Except for the root

– Root node has at least two pointers unless it is a leaf node

● Internal node with q pointers has q-1 search field values

Page 14: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

14amiri – advanced databases '05

B+ trees – internal node

120 150 180

K < 120 120 ≤ K < 150 150 ≤ K < 180 K ≥ 180To keys

Page 15: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

15amiri – advanced databases '05

B+ tree – Leaf nodes

● Each leaf node is of the form– <(K

1,Pr

1), P

2, (K

2,Pr

2), ..,(K

q-1, Pr

q-1), P

next>

– q <= p

– Pnext

(sequence pointer) points to next leaf node of B+ tree

– Pri is a data pointer, points to record whose key value is K

i

● Or points to an “indirect” block of pointers to data records– If search key is a nonkey field

● Within each leaf node, K1 < K

2 < ... < K

q-1

● Each leaf node has at least p/2 values● All leaf nodes are at the same level

Page 16: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

16amiri – advanced databases '05

B+ tree – leaf node

3 5 11

To record with key 3

To record with key 5

To record with key 11

Sequence pointer(To next leaf node)

Page 17: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

17amiri – advanced databases '05

B+ tree – search

● At each level, find smallest key Ki larger than search-key

– Follow the associated pointer (Pi)

– If no such key found, follow last pointer in node ...until leaf node

k<100 k>=100

120 150 18030

100

3 5 11 30 35 100 101 110 120 130 150 156 179 180 200

Data pointers

k>=30k<30

Page 18: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

18amiri – advanced databases '05

B+ tree – built on a nonkey field

● B+ tree can be built on a search-key that is not unique– Many records may have the same search-key value

● Leaf nodes usually contain record pointers● If a search value matches more than one record

– Leaf node entry stores a pointer to an “indirect” block● Block contains pointers to all records with that search-key

Page 19: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

19amiri – advanced databases '05

B+ tree operations – insert● Insert and delete are efficient but a bit complicated

– Because nodes may overflow or underflow

● Ignoring node overflow and underflow– Insert data record with search-key k

● Find leaf node where k would appear● If search-key k found

– Add data record to file, create indirect block if there isn't one– Add record pointer to indirect block

● If search-key k not found– Add data record to file– Insert (k, data pointer) in leaf node

● Such that all search keys in leaf node remain in order

Page 20: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

20amiri – advanced databases '05

B+ tree operations – delete● Ignoring node overflow and underflow

– Delete data record with search-key k● Find leaf node with search-key k● Find data record pointer, delete data record from file● Remove (k, record pointer) entry from leaf node if there is no

indirect block associated with that entry– Or if indirect block becomes empty as a result of the deletion

Page 21: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

21amiri – advanced databases '05

B+ insert ● Four cases

– 1. Simple case: There is space in leaf node

– 2. Leaf node overflow

– 3. Internal node overflow

– 4. New root

Page 22: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

22amiri – advanced databases '05

B+ tree insert – case 1

k<100 k>=100

12030

100

3 5 11 30 35 100 101 110 120 130

Data pointers

42

Insert  42 Tree of order = 3

Page 23: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

23amiri – advanced databases '05

B+ tree insert – case 2

k<100 k>=100

12030

100

3 5 11 30 35 100 101 110 120 130

Data pointers

42

Insert  9 Tree of order = 3

Page 24: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

24amiri – advanced databases '05

B+ tree insert – case 2 (p2)

1209

100

9 11 30 35

Insert  9

3 5

30

Page 25: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

25amiri – advanced databases '05

B+ tree insert – case 3

120 1509

100

100 101 110 120 130 150 156 179

Insert  165

30

Page 26: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

26amiri – advanced databases '05

B+ tree insert – case 3 (p2)

120 1509

100

100 101 110 120 130 150 156

Insert  165

30

165 179

Page 27: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

27amiri – advanced databases '05

B+ tree insert – case 3 (p3)

1209

100

100 101 110 120 130 150 156

Insert  165

30

165 179

165

Page 28: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

28amiri – advanced databases '05

B+ tree insert – case 3 (p4)

1209

100

100 101 110 120 130 150 156

Insert  165

30

165 179

165

150

Page 29: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

29amiri – advanced databases '05

B+ tree insert – case 4 (new root)

120 150

100 101 110 120 130 150 156 179

Insert  170Root node

Page 30: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

30amiri – advanced databases '05

B+ tree insert – case 4 (p2)

120 150

100 101 110 120 130 150 156

Insert  170

170 179

Page 31: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

31amiri – advanced databases '05

B+ tree insert – case 4 (p3)

120

100 101 110 120 130 150 156

Insert  170

170 179

170

Page 32: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

32amiri – advanced databases '05

B+ tree insert – case 4 (p4)

120

100 101 110 120 130 150 156

Insert  170

170 179

170

150 New root node

Page 33: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

33amiri – advanced databases '05

B+ tree delete

120

100 101 110 120 130 150 156

Delete  110

170 179

170

150

Page 34: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

34amiri – advanced databases '05

B+ tree delete

120

100 101 110 120 130 150 156

Delete  130

170 179

170

150

Page 35: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

35amiri – advanced databases '05

B+ tree operations – delete● Simple case

– Deletion does not cause underflow at leaf

● Underflow case – Key redistribution– Redistribute keys

● Borrow one key from adjacent node● Redistribute evenly between adjacent nodes

– Update parent nodes

Page 36: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

36amiri – advanced databases '05

B+ tree operations – delete● Underflow case – Coalescing nodes

– Coalesce with sibling node

– Update pointers at parent node

– Parent node may underflow● Recursively apply deletion procedure up the tree

Page 37: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

37amiri – advanced databases '05

Index summary● B+tree, a fast and efficient multi-level index

– Dynamic balanced data structure

– Efficient insert and delete● At the expense of some space overhead

– B+tree supports equality and range searching

● Dynamic hashing schemes – Dynamic schemes that grow/shrink with data file

– Support equality search

– But no support for range queries

● B+ trees widely implemented in commercial DBMSs

Page 38: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

38amiri – advanced databases '05

Indices in SQL● Index optimisation not a trivial task

– Which field(s) to create indices on?

– In principle, the DBMS should automatically figure this out● Based on cost of index maintenance● And the benefit to query workload

– Not quite automatic today● Although some DBMSs provide tools to assist

– E.g. “index wizard”

Page 39: More on indexing: B+ trees - 123seminarsonly.com€¦ · B trees General form of ... – B+tree supports equality and range searching Dynamic hashing schemes – Dynamic schemes that

39amiri – advanced databases '05

Indices in SQL● Index creation via SQL DDL

– create index <index-name> on <table-name> (<attribute list>)

– create index c-index on account (cust-city)

● If we want to declare that search-key is a candidate key– create unique index <index-name> on <table-name>

(<attribute list>)

– Index creation fails if table contains duplicates values for the search-key

– Once index is created, insertion of duplicate values for that field are rejected