chapter 10. indexed sequential file access and prefix b...

37
Chapter 10. Indexed Sequential File Access and Prefix B+ trees Kim Joung-Joon Database Lab. [email protected]

Upload: others

Post on 24-Jun-2020

60 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

Chapter 10. Indexed Sequential File Access and

Prefix B+ trees

Kim Joung-Joon

Database Lab.

[email protected]

Page 2: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

Chapter Outline 10.1 Indexed Sequential Access 10.2 Maintaining a Sequence Set 10.3 Adding a Simple Index to the Sequence Set 10.4 The Content of the Index: Separators Instead of Keys 10.5 The Simple Prefix B+ Tree 10.6 Simple Prefix B+ tree Maintenance 10.7 Index Set Block Size 10.8 Internal Structure of Index Set Blocks: A Variable-Order

B-tree 10.9 Loading a Simple Prefix B+ tree 10.10 B+ Trees 10.11 B-Trees, B+ Trees, and Simple Prefix B+ Trees in

Perspective

File Structures (10) Konkuk University (DB Lab.) 2

Page 3: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.1 Indexed Sequential Access Single organization

Indexed : seen as a set of records that is indexed by key

B-tree : for excellent indexed access

Sequential : returning records in order by key (physically

contiguous records -- no seeking)

Usage of B-tree for a cosequential processing

retrieve all the records in order by key

B-tree file of N records,

following the N pointers from the index requires N

random seeks => inefficient process Indexed sequential access method for both interactive random access and cosequential batch

processing

File Structures (10) Konkuk University (DB Lab.) 3

Page 4: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.2 Maintaining a Sequence Set Indexed part of indexed sequential access

keep a set of records in physical order by key as record

are added and deleted Sequence set a set of records in physical order by key

not a tree

separation of records into block

File Structures (10) Konkuk University (DB Lab.) 4

Page 5: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.2.1 The Use of Blocks (1/2) Sorting and resorting the entire sequent set as records are added and deleted very expensive process

Block

the basic unit of input and output

the size of buffers : can hold an entire block

Blocks for collecting the records

a way to localize the changes

restrict the effects of an insertion or deletion to just a part

of the sequence set

File Structures (10) Konkuk University (DB Lab.) 5

Page 6: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.2.1 The Use of Blocks (1/2) Blocks of sequence set in order insertion of new records into a block

overflow can be handled by a block-splitting process

divide (split) the records between two blocks and

rearrange the links

analogous to, but not the same that of a B-tree

just divide two blocks between two blocks and

rearrange the links

no promotion of key

deletion of records from a block

underflow : less than half full

a neighboring node : also half full => merge

neighboring nodes : more than half full => redistribution

analogous to, but not the same that of a B-tree

File Structures (10) Konkuk University (DB Lab.) 6

Page 7: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.2.1 The Use of Blocks (2/2)

File Structures (10) Konkuk University (DB Lab.) 7

Page 8: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.2.2 Choice of Block Size Block

the maximum guaranteed extent of physical sequentiality

make each block equal to the size of a cluster (i.e., the

minimum number of sectors allocated at a time)

File Structures (10) Konkuk University (DB Lab.) 8

Page 9: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.3 Adding a Simple Index to the Sequence Set (1/2) Sequence set

based on the idea of grouping the records into blocks and

then maintaining the blocks

view each block as containing a range of records

File Structures (10) Konkuk University (DB Lab.) 9

Page 10: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.3 Adding a Simple Index to the Sequence Set (2/2) Index of fixed-length records

contain the key for the last record in each block

B+-tree

The use of a B-tree index for sequence set of blocks

File Structures (10) Konkuk University (DB Lab.) 10

Page 11: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.4 The Content of the Index: Separators Instead of Keys (1/3) Separators

do not need to have actual keys in the index set

can save space by placing the shortest separator in the

index structure

File Structures (10) Konkuk University (DB Lab.) 11

Page 12: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.4 The Content of the Index: Separators Instead of Keys (2/3) Potential separators

capable of distinguishing between two blocks Relation of search key and separator Decision

Key < separator Go left

Key = separator Go right

Key > separator Go right

File Structures (10) Konkuk University (DB Lab.) 12

Page 13: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.4 The Content of the Index: Separators Instead of Keys (3/3) Separator

can be treated as variable-length entities within index

structure

save space by placing the shortest separator in the index

structure

File Structures (10) Konkuk University (DB Lab.) 13

Page 14: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.5 The Simple Prefix B+ Tree Prefix B+ tree

B-tree index (≡ index set) plus the sequence set

Simple prefix indicates that the index set contains shortest

separators, or prefixes of the keys

File Structures (10) Konkuk University (DB Lab.) 14

Page 15: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

15

10.6 Simple Prefix B+ Tree Maintenance 10.6.1 Changes Localized to Single Blocks in the Sequence Set Deletions on the sequence set

delete the records for EMBRY and FOLKS

limited to changes within blocks since # of sequence set blocks is unchanged, the index set can remain unchanged File Structures (10) Konkuk University (DB Lab.)

Page 16: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.6.2 Changes Involving Multiple Blocks in the Sequence Set (1/6) Addition of records to the sequence set

can change the number of blocks in the sequence set

=> need additional separators in the index set

Deletion of records from the sequence set

if we have fewer blocks, we need fewer separators

Changes to the index set

handled by B-tree insertion and deletion

File Structures (10) Konkuk University (DB Lab.) 16

Page 17: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.6.2 Changes Involving Multiple Blocks in the Sequence Set (2/6) Example of a B-tree of order three

the maximum number of separators is two

File Structures (10) Konkuk University (DB Lab.) 17

Page 18: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.6.2 Changes Involving Multiple Blocks in the Sequence Set (3/6) Example of a B-tree of order three (Cont’d)

insertion into block1 in the sequence set

split block 1 => block 1 and block 7

change to the sequence set

addition of a block in the sequence set

need a new separator, with a value of AY

cause a split and promotion of BO to the root

File Structures (10) Konkuk University (DB Lab.) 18

Page 19: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.6.2 Changes Involving Multiple Blocks in the Sequence Set (4/6) Example of a B-tree of order three (Cont’d)

File Structures (10) Konkuk University (DB Lab.) 19

Page 20: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.6.2 Changes Involving Multiple Blocks in the Sequence Set (5/6) Example of a B-tree of order three (Cont’d)

deletion of a record from block 2 of the sequence set

cause an underflow condition in the sequence set

need a concatenation of blocks 2 and 3 (assumption)

remove block 3

remove the separator CAM that distinguish between

blocks 2 and 3

cause an underflow in the index set node => merging

need a concatenation of index nodes

bring BO back down from the root

File Structures (10) Konkuk University (DB Lab.) 20

Page 21: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.6.2 Changes Involving Multiple Blocks in the Sequence Set (6/6) Block split in the sequence set results in a node

split in the index set Block merge(concatenation) in the sequence set results in a node merge(concatenation) in the index

set Insertions and deletions in the index set are handled as B-tree operations Record insertion and deletion always take place in

the sequence set

File Structures (10) Konkuk University (DB Lab.) 21

Page 22: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.7 Index Set Block Size Common block size for the index and sequence sets

the block size that is best for the sequence set is usually

best for the index set

a common block size makes it easier to implement a

buffering scheme to create a virtual simple prefix B+ tree

use of one file for both kinds of blocks is simpler if the

block sizes are the same

File Structures (10) Konkuk University (DB Lab.) 22

Page 23: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.8 Internal Structure of Index Set Blocks: A Variable-Order B-Tree (1/5) Large, fixed-size block structure for a fixed # of separators need not use a shortest separators

Motivation of shortest separators

possibility of packing more of them into a node

not for the index set that uses a fixed-order B-tree(i.e., a

fixed number of separators per node)

Large, fixed-size block for the index set for a

variable # of variable-length separators a variable-order B-tree

how to search through variable-length separators

a block can hold a large number of separators

once we read a block into memory, binary search rather

than sequential search is needed

File Structures (10) Konkuk University (DB Lab.) 23

Page 24: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.8 Internal Structure of Index Set Blocks: A Variable-Order B-Tree (2/5) Variable-length separators and corresponding index

File Structures (10) Konkuk University (DB Lab.) 24

Page 25: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.8 Internal Structure of Index Set Blocks: A Variable-Order B-Tree (3/5) Index set block needs to store references(called a relative block number(RBN)) to its children

Many ways to combine the list of separators, index

to separators, and list of RBNs into a single index set block

Structure of an index set block(Fig. 10.12)

separator count

to find the middle element in the index to the separators

for binary search

total length of separators

the list of merged separators varies in length from block

to block

Separators and relative block numbers(Fig. 10.13)

File Structures (10) Konkuk University (DB Lab.) 25

Page 26: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.8 Internal Structure of Index Set Blocks: A Variable-Order B-Tree (4/5) Key : Beck

File Structures (10) Konkuk University (DB Lab.) 26

Page 27: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.8 Internal Structure of Index Set Blocks: A Variable-Order B-Tree (5/5) Node within the B+-tree index set is of variable

order

each index set block contains a variable number of

separators

# of separators in a block is directly limited by block size

(not an order m of B-tree)

decision about when to split, merge, redistribute becomes

complicated

File Structures (10) Konkuk University (DB Lab.) 27

Page 28: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.9 Loading a Simple Prefix B+ tree (1/4) Split or redistribution of blocks in the sequence set and in the index set in order to maintain B+ tree splitting and redistribution are relatively expensive

involve searching down through the tree for each insertion

then reorganization the tree

when we are loading the B+ tree, begin by sorting

the records that are to be loaded place the records into sequence set blocks, one by one, starting a new block

make the transition between two sequence set blocks

determine the shortest separator for the blocks

collect separators into an index set block

File Structures (10) Konkuk University (DB Lab.) 28

Page 29: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.9 Loading a Simple Prefix B+ tree (2/4) Four sequence set blocks and one index set block

File Structures (10) Konkuk University (DB Lab.) 29

Page 30: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.9 Loading a Simple Prefix B+ tree (3/4) Building of two index set levels

File Structures (10) Konkuk University (DB Lab.) 30

Page 31: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.9 Loading a Simple Prefix B+ tree (4/4) Growth of index set built up from the sequence set

File Structures (10) Konkuk University (DB Lab.) 31

Page 32: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.10 B+ Trees (1/2) B+ tree

does not involve the use of prefixes as separators

separators in the index set are copies of the actual keys

Fig. 10.17 : initial loading steps for a B+ tree

Difference between a B+ tree and a simple prefix B+

tree

distinguish between the role of the separators in the index

set and keys in the sequence set

difficult to make distinction when the separators are exact

copies of the keys

File Structures (10) Konkuk University (DB Lab.) 32

Page 33: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.10 B+ Trees (2/2) B+ tree without the use of shortest separators

File Structures (10) Konkuk University (DB Lab.) 33

Page 34: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.11 B-Trees, B+ Trees, and Simple Prefix B+ Tree in Perspective (1/4) Shared characteristics

paged index structures : broad and shallow

height-balanced

growing from bottom-up

possible to obtain greater storage efficiency through two-

three block splitting, concatenation, redistribution

can be implemented as virtual tree structures

can be adapted for variable-length records

File Structures (10) Konkuk University (DB Lab.) 34

Page 35: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.11 B-Trees, B+ Trees, and Simple Prefix B+ Tree in Perspective (2/4) B-Trees information can be found at any level of the B-trees B-trees can take up less space than does a B+ trees because there is no need for additional storage to hold

separators sequential access obtained through an in-order traversal of the tree not workable because of all the seeking required to

retrieve the actual record information if the B-trees

merely contains pointers to records

File Structures (10) Konkuk University (DB Lab.) 35

Page 36: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.11 B-Trees, B+ Trees, and Simple Prefix B+ Tree in Perspective (3/4) B+ Trees separation of the index set and the sequence set separators : copies of the keys advantages of the B+ trees over B-trees sequence set can be processed in a truly linear,

sequential way, providing efficient access to records in

order by key B+ tree approach can often result in a shallower tree

than would B-tree approach

File Structures (10) Konkuk University (DB Lab.) 36

Page 37: Chapter 10. Indexed Sequential File Access and Prefix B treeselearning.kocw.net/contents4/document/lec/2012/... · 10.5 The Simple Prefix B + Tree 10.6 Simple Prefix B + tree Maintenance

10.11 B-Trees, B+ Trees, and Simple Prefix B+ Tree in Perspective (4/4) Simple Prefix B+ Trees

separation of the index set and the sequence set (same as the B+ trees) sequence set can be processed in a truly linear, sequential

way (the same as the B+ trees) separators : smaller than the actual keys allow us to build a shallower tree than B+ trees separator compression and consequent increase in

branching factor -> use an index set block structure that support variable-

length fields

File Structures (10) Konkuk University (DB Lab.) 37