physical dbs b+ tree

Upload: satya-prakash

Post on 06-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Physical DBs B+ Tree

    1/35

    1DBMS , A. Yaz c

    Physical Level of Databases:B+-Trees

    Adnan YAZICIComputer Engineering Department

    METU

    (Fall 2005)

  • 8/3/2019 Physical DBs B+ Tree

    2/35

    2DBMS , A. Yaz c

    B+-Tree Index Files

    l Disadvantage of indexed-sequential files:performance degrades as file grows, since manyoverflow blocks get created. Periodic reorganizationof entire file is required.

    l Advantage of B +-tree index files: automatically

    reorganizes itself with small and local changes, inthe face of insertions and deletions. Reorganizationof entire file is not required to maintain performance.

    l Disadvantage of B +-trees: extra insertion anddeletion overhead, space overhead.

    l Advantages of B +-trees outweigh disadvantages,and they are used extensively.

  • 8/3/2019 Physical DBs B+ Tree

    3/35

    3DBMS , A. Yaz c

    B+-Tree Index Files (Cont.)

    l All paths from root to leaf are of the same

    lengthl Each node that is not a root or a leaf has

    between [ n/2] and n children.l

    Special cases: If the root is not a leaf, it has at least 2

    children.

    If the root is a leaf (that is, there are noother nodes in the tree), it can havebetween 0 and ( n 1) values.

  • 8/3/2019 Physical DBs B+ Tree

    4/35

    4DBMS , A. Yaz c

    B+-Tree is a variation of search trees that allow efficientinsertion and deletion of new search values.

    In B+-Tree , each node corresponds to a disk block.Each node is kept between half-full and completely-full.An insertion into a node that is not full is quite efficient;

    if a node is full the insertion causes a split into two nodes.Splitting may propagate to other tree levels.A deletion is efficient if a node does not become less than

    half full.If a deletion causes a node to become less than half full,it must be merged with neighboring nodes.

    B+-Trees as Dynamic Multi-level Indexes

  • 8/3/2019 Physical DBs B+ Tree

    5/35

    5DBMS , A. Yaz c

    B+

    -Tree Node Structurel Typical node

    Ki are the search-key values P i are pointers to children (for non-leaf nodes) or

    pointers to records or buckets of records (for leaf nodes).

    l The search-keys in a node are orderedK 1 < K 2 < K 3 < . . . < K n 1

  • 8/3/2019 Physical DBs B+ Tree

    6/35

    6DBMS , A. Yaz c

    Leaf Nodes in B +-Trees

    l For i = 1, 2, . . ., n, pointer P i either points to a file record withsearch-key value K i , or to a bucket of pointers to file records,each record having search-key value K i . Only need bucket

    structure if search-key does not form a primary key.l If Li , L j are leaf nodes and i < j, Li s search-key values are less

    than L j s search-key valuesl P n points to next leaf node in search-key order

  • 8/3/2019 Physical DBs B+ Tree

    7/35

    7DBMS , A. Yaz c

    Non-Leaf Nodes in B +-Trees

    l Non leaf nodes form a multi-level sparseindex on the leaf nodes. For a non-leaf nodewith m pointers: All the search-keys in the subtree to which P 1

    points are less than K 1 For 2 i n 1, all the search-keys in the subtree

    to which P i points have values greater than or equal to K i 1 and less than K m1

  • 8/3/2019 Physical DBs B+ Tree

    8/35

    8DBMS , A. Yaz c

    Example of a B +-tree

  • 8/3/2019 Physical DBs B+ Tree

    9/35

    9DBMS , A. Yaz c

    Example of B +-tree

    l All nodes must have between 2 and 4 key values( (n)/2 and n, with n = 5).

    l Root must have at least 2 children.

    ! "

  • 8/3/2019 Physical DBs B+ Tree

    10/35

    10DBMS , A. Yaz c

    Observations about B +-trees

    l Since the inter-node connections are done bypointers, logically close blocks need not be

    physically close.l The non-leaf levels of the B +-tree form a hierarchy

    of sparse indices.

    l The B +-tree contains a relatively small number of levels (logarithmic in the size of the main file),thus searches can be conducted efficiently.

    l Insertions and deletions to the main file can behandled efficiently, as the index can berestructured in logarithmic time (as we shall see).

  • 8/3/2019 Physical DBs B+ Tree

    11/35

    11DBMS , A. Yaz c

    Queries on B +-Treesl Find all records with a search-key value of k.

    1. Start with the root node1. Examine the node for the smallest search-key value > k.

    2. If such a value exists, assume it is K j . Then follow P i tothe child node

    3. Otherwise k K m 1 , where there are m pointers in thenode. Then follow P m to the child node.

    2. If the node reached by following the pointer aboveis not a leaf node, repeat step 1 on the node

    3. Else we have reached a leaf node.

    1. If for some i , key K i = k follow pointer P i to the desiredrecord or bucket.2. Else no record with search-key value k exists.

  • 8/3/2019 Physical DBs B+ Tree

    12/35

    12DBMS , A. Yaz c

    Queries on B +-Trees (Cont.)l

    In processing a query, a path is traversed in the tree fromthe root to some leaf node.l If there are K search-key values in the file, the path is no

    longer than log n/2 (K ) .l A node is generally the same size as a disk block,

    typically 4 kilobytes, and n is typically around 100 (40bytes per index entry).

    l With 1 million search key values and n = 100, at mostlog 50 (1,000,000) = 4 nodes are accessed in a lookup.

    l Contrast this with a balanced binary free with 1 millionsearch key values around 20 nodes are accessed in alookup above difference is significant since every node

    access may need a disk I/O, costing around 20

    milliseconds!

  • 8/3/2019 Physical DBs B+ Tree

    13/35

    13DBMS , A. Yaz c

    B+ Tree

  • 8/3/2019 Physical DBs B+ Tree

    14/35

    14DBMS , A. Yaz c

    l

    B+ Tree Propertiesl B+ Tree Searchingl B+ Tree Insertionl B+ Tree Deletion

    B+ Tree

  • 8/3/2019 Physical DBs B+ Tree

    15/35

    15DBMS , A. Yaz c

    Balanced Treel Same height for paths from root to leaf l Given a search-key K, nearly same access time for

    different K values B+ Tree is constructed by parameter n

    l Each Node (except root) has n/2 to n pointersl Each Node (except root) has n/2 -1 to n-1 search-

    key values

    P1 P2 P3K 1 K 2

    Case for n=3

    P1

    K 1

    K 2

    K n-1

    P2 Pn-1 Pn

    General case for n

    B+ Tree

  • 8/3/2019 Physical DBs B+ Tree

    16/35

    16DBMS , A. Yaz c

    l Search keys are sorted in order K1 < K2 <

  • 8/3/2019 Physical DBs B+ Tree

    17/35

  • 8/3/2019 Physical DBs B+ Tree

    18/35

    18DBMS , A. Yaz c

    l Overflow When the number of search-key values exceed n-1

    7 9 13 15 Insert 8 Leaf Node

    Split into two nodes: 1st node contains (n-1)/2 values 2nd node contains remaining values Copy the smallest search-key value of the 2 nd node

    to parent node

    7 8

    9

    9 13 15

    B+ Tree:Insertion

  • 8/3/2019 Physical DBs B+ Tree

    19/35

    19DBMS , A. Yaz c

    l Overflow When number of search-key values exceed n-1

    7 9 13 15 Insert 8 Non-Leaf Node

    Split into two nodes: 1st node contains n/2 -1 values Move the smallest of the remaining values, together with pointer, to the parent 2nd node contains the remaining values

    7 8 13 15

    9

    B+ Tree: Insertion

  • 8/3/2019 Physical DBs B+ Tree

    20/35

    20DBMS , A. Yaz c

    l Example 1: Construct a B + tree for (1, 4, 7,10, 17, 21, 31, 25, 19, 20, 28, 42) with n=4.

    1 4 71 4

    7

    7 10

    7 17

    1 4 7 10 17 21

    B+ Tree: Insertion

  • 8/3/2019 Physical DBs B+ Tree

    21/35

    21DBMS , A. Yaz c

    l 1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42B+ Tree Insertion

    1 4 7 10

    7

    17 21

    17

    25 31

    25

    1 4 7 10

    7

    17 19 20 25 3121

    20 25

    17

    B+ Tree: Insertion

  • 8/3/2019 Physical DBs B+ Tree

    22/35

    22DBMS , A. Yaz c

    l 1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42

    1 4

    7 10

    7

    17 19

    20 21

    20 25

    17

    25 28

    31 42

    31

    B+ Tree: Insertion

  • 8/3/2019 Physical DBs B+ Tree

    23/35

    23DBMS , A. Yaz c

    l Example 2: n=3, insert 4 into the followingB+Tree 9 10

    7 8

    2 5 Leaf

    A

    Leaf

    B

    SubtreeC

    SubtreeD

    7

    4 8

    42

    10

    9

    C D

    A B5

    B+ Tree: Insertion

  • 8/3/2019 Physical DBs B+ Tree

    24/35

    24DBMS , A. Yaz c

    l Underflow When number of search-key values < n/2 -1

    Leaf NodeRedistribute to sibling

    Right node not less than left node

    Replace the between-value in parent by their smallest value of the rightnode

    Merge (contain too few entries)

    Move all values, pointers to left nodeRemove the between-value in parent

    9 10 Delete 10

    9 10 13 14

    13

    16

    18

    9 13 14 16

    14 18

    9 10 13 14

    13 18 22

    9 13

    18

    14

    22

    B+ Tree: Deletion

  • 8/3/2019 Physical DBs B+ Tree

    25/35

    25DBMS , A. Yaz c

    Non-Leaf NodeRedistribute to sibling

    Through parentRight node not less than left node

    Merge (contain too few entries)

    Bring down parentMove all values, pointers to leftnodeDelete the right node, and pointers

    in parent

    9 10 Delete 10

    9 10 14 15

    13

    16

    18

    9 13 15 16

    14 18

    9 10 14 16

    13 18 22

    9 13

    18

    14

    22

    16

    B+ Tree: Deletion

  • 8/3/2019 Physical DBs B+ Tree

    26/35

    26DBMS , A. Yaz c

    l Example 3: n=3, delete 3

    5 20

    3

    1 3

    8 SubtreeA

    5 8

    1

    20

    SubtreeA

    B+ Tree: Deletion

  • 8/3/2019 Physical DBs B+ Tree

    27/35

  • 8/3/2019 Physical DBs B+ Tree

    28/35

    28DBMS , A. Yaz c

    l Example 4: Delete 28, 31, 21, 25, 19

    177 20

    17 19 20 25 42

    50

    1 4 7 10

    7 17 50

    1 4 17 20 427 10

    B+ Tree: Deletion

  • 8/3/2019 Physical DBs B+ Tree

    29/35

    29DBMS , A. Yaz c

    B+-Tree File Organizationl Index file degradation problem is solved by using B +-

    Tree indices. Data file degradation problem is solvedby using B +-Tree File Organization.

    l The leaf nodes in a B +-tree file organization storerecords, instead of pointers.

    l Since records are larger than pointers, the maximum

    number of records that can be stored in a leaf node isless than the number of pointers in a nonleaf node.l Leaf nodes are still required to be half full.l Insertion and deletion are handled in the same way

    as insertion and deletion of entries in a B +-tree index.

  • 8/3/2019 Physical DBs B+ Tree

    30/35

    30DBMS , A. Yaz c

    B+-Tree File Organization (Cont.)

    l Good space utilization important since records usemore space than pointers.

    l To improve space utilization, involve more siblingnodes in redistribution during splits and merges Involving 2 siblings in redistribution (to avoid split /

    merge where possible) results in each node having

    at least entries

    # $ % & '

  • 8/3/2019 Physical DBs B+ Tree

    31/35

    31DBMS , A. Yaz c

    Indexing Strings

    l Variable length strings as keys Variable fanout

    Use space utilization as criterion for splitting, notnumber of pointers

    l Prefix compression Key values at internal nodes can be prefixes of full

    keyl Keep enough characters to distinguish entries in the

    subtrees separated by the key value E.g. Silas and Silberschatz can be separated by Silb

  • 8/3/2019 Physical DBs B+ Tree

    32/35

    32DBMS , A. Yaz c

    B-Tree Index Filesn

    ( $ ) * + , $ +

    n ( + ,

    + $ * n - '

    l Nonleaf node pointers B i are the bucket or file record pointers.

  • 8/3/2019 Physical DBs B+ Tree

    33/35

    33DBMS , A. Yaz c

    B-Tree Index File Example

    B-tree (above) and B+-tree (below) on same data

  • 8/3/2019 Physical DBs B+ Tree

    34/35

    34DBMS , A. Yaz c

    B-Tree Index Files (Cont.)l

    Advantages of B-Tree indices: May use less tree nodes than a corresponding B +-Tree. Sometimes possible to find search-key value before

    reaching leaf node.l Disadvantages of B-Tree indices:

    Only small fraction of all search-key values are foundearly

    Non-leaf nodes are larger, so fan-out is reduced. Thus,B-Trees typically have greater depth than correspondingB+-Tree

    Insertion and deletion more complicated than in B +-Trees Implementation is harder than B +-Trees.

    l Typically, advantages of B-Trees do not outweigh

    disadvantages.

  • 8/3/2019 Physical DBs B+ Tree

    35/35

    35DBMS , A. Yaz c

    Index Definition in SQL

    l Create an indexcreate index on

    ()E.g.: create index b-index on branch(branch_name)

    l Use create unique index to indirectly specify andenforce the condition that the search key is a candidatekey. Not really required if SQL unique integrity constraint is

    supportedl To drop an index

    drop index