indexing and b+-trees by kenneth cheung cs 157b tr 07:30-08:45 professor lee
DESCRIPTION
Factors of Indices 1. Access type 1. Access type 2. Access Time 2. Access Time 3. Insertion time 3. Insertion time 4. Deletion time 4. Deletion time 5. Space overhead 5. Space overheadTRANSCRIPT
Indexing and B+-TreesIndexing and B+-Trees
By Kenneth CheungBy Kenneth CheungCS 157B TR 07:30-08:45CS 157B TR 07:30-08:45
Professor LeeProfessor Lee
Introduction to IndexingIntroduction to Indexing
Goal: to make it easier to look Goal: to make it easier to look up dataup data
Do by saving the data in a Do by saving the data in a sorted, compressed versionsorted, compressed version
Searching and insertion will be Searching and insertion will be easiereasier
Factors of IndicesFactors of Indices
1. Access type1. Access type 2. Access Time2. Access Time 3. Insertion time3. Insertion time 4. Deletion time4. Deletion time 5. Space overhead5. Space overhead
Clustering IndexClustering Index
an index whose search key also an index whose search key also defines the sequential order of defines the sequential order of the filethe file
Index-sequential filesIndex-sequential files
files ordered sequentially on a files ordered sequentially on a search keysearch key
Index RecordIndex Record
(aka index entry)- holds the (aka index entry)- holds the search-key value and pointers to search-key value and pointers to the records with the valuethe records with the value
PointerPointer
identifies disk block or offset to identifies disk block or offset to disk blockdisk block
Dense IndexDense Index
a record appears for every a record appears for every search key value. Records are search key value. Records are stored in the same search-keystored in the same search-key
faster access time, but higher faster access time, but higher space overheadspace overhead
Sparse IndexSparse Index
an index record appears on an index record appears on some search-key values. To find some search-key values. To find a record, the system finds the a record, the system finds the largest search key value that is largest search key value that is less than or equal to the given less than or equal to the given search-key value then it moves search-key value then it moves up to finds it if it is notup to finds it if it is not
lower space overhead, but lower space overhead, but higher access timehigher access time
Larger DatabasesLarger Databases
Make a sparse index on a Make a sparse index on a clustering index, using 2 levels clustering index, using 2 levels of indicesof indices
Multilevel indices search faster Multilevel indices search faster than a binary searchthan a binary search
Index Update (Insertion)Index Update (Insertion)
A. Look up search keyA. Look up search key B. If the index record stores all B. If the index record stores all
pointers with the same index pointers with the same index value, then add a new pointer to value, then add a new pointer to the index recordthe index record
C. Otherwise, the index stores C. Otherwise, the index stores the first pointer to the index the first pointer to the index valuevalue
Index update- (Insertion to Index update- (Insertion to Sparse Indices)Sparse Indices) For sparse indices, if the system For sparse indices, if the system
makes a new block, then it must makes a new block, then it must add the first search-key value to add the first search-key value to the new index the new index
if the value has the least search if the value has the least search key value in the block, the index key value in the block, the index record is updated pointing to the record is updated pointing to the blockblock
DeletionDeletion
A. Look up recordA. Look up record B. If it was a dense index and B. If it was a dense index and
the record deleted was the only the record deleted was the only one with the search key, then one with the search key, then delete the key form the indexdelete the key form the index
C. If the record stores pointers C. If the record stores pointers to all records, then the pointer to to all records, then the pointer to the deleted record is removedthe deleted record is removed
Deletion (cont’d)Deletion (cont’d)
D. If the record stores the D. If the record stores the pointer to the first record and the pointer to the first record and the first record is deleted, then the first record is deleted, then the pointer moves to the following pointer moves to the following recordrecord
E. If the index is sparse and E. If the index is sparse and the index does not contain the the index does not contain the search-key value, then the index search-key value, then the index remains the same.remains the same.
Deletion (cont’d)Deletion (cont’d)
F. If deleted record had the only F. If deleted record had the only search key, then the system search key, then the system replaces the corresponding replaces the corresponding index search record for the next index search record for the next search key value. If the next search key value. If the next search key value is an index search key value is an index entry, then the entry is deleted entry, then the entry is deleted instead of being replacedinstead of being replaced
Deletion (cont’d)Deletion (cont’d)
G. If the index record for the G. If the index record for the search-key point to the record search-key point to the record being deleted, the pointer goes being deleted, the pointer goes to the next record with the same to the next record with the same search key value.search key value.
Secondary IndicesSecondary Indices
A. Secondary Indices are dense A. Secondary Indices are dense and points to all recordsand points to all records
B. Stored sequentially and may B. Stored sequentially and may not have non-candidate keysnot have non-candidate keys
C. If a multi-indexed database is C. If a multi-indexed database is updated, then every index must updated, then every index must be updated alsobe updated also
B+-TreesB+-Trees
An alternative to An alternative to Binary Search TreesBinary Search Trees
Conditions of a B+-TreeConditions of a B+-Tree
A. Search-key values are K1, A. Search-key values are K1, K2...Kn-1K2...Kn-1
B. Pointers P1, P2...PnB. Pointers P1, P2...Pn C. Search key values are kept in C. Search key values are kept in
sorted ordersorted order
Conditions (cont’d)Conditions (cont’d)
D. Pointer P points to a file D. Pointer P points to a file record with a search-key value record with a search-key value of K or a bucket of more of K or a bucket of more pointerspointers
E. Each node has more than 2 E. Each node has more than 2 pointers (binary tree has 2)pointers (binary tree has 2)
F. Stores redundant search-key F. Stores redundant search-key valuesvalues
BucketsBuckets
Buckets are used only if the Buckets are used only if the search key value does not form search key value does not form a candidate key and if the file is a candidate key and if the file is not stored in search key ordernot stored in search key order
LeavesLeaves
A. Each leaf holds up to n-1 A. Each leaf holds up to n-1 valuesvalues
B. Pointers P chain together B. Pointers P chain together leaf nodes in search key orderleaf nodes in search key order
C. Non-leaf nodes are sparse C. Non-leaf nodes are sparse multilevel indicesmultilevel indices
Leaves (cont’d)Leaves (cont’d)
D. Non-leaf nodes may hold up D. Non-leaf nodes may hold up to n/2 ceil to n pointersto n/2 ceil to n pointers
E. Number of pointers in a node E. Number of pointers in a node is a fan out of a nodeis a fan out of a node
F. The root must hold at 2 to n/2 F. The root must hold at 2 to n/2 pointerspointers
Queries for finding VQueries for finding V
A. To find search-key value V, A. To find search-key value V, start at rootstart at root
B. It looks for the smallest B. It looks for the smallest search-key greater than Vsearch-key greater than V
C. If it finds a K, then the pointer C. If it finds a K, then the pointer P goes to another nodeP goes to another node
Queries (cont’d)Queries (cont’d)
D. The process repeats going D. The process repeats going down the tree by finding a down the tree by finding a search key value K that equals search key value K that equals V. V.
E. If there is no K that equals V E. If there is no K that equals V at the leaf, then no such record at the leaf, then no such record existsexists
B+-tree InsertionB+-tree Insertion
A. First look upA. First look up B. If the search key value exists in B. If the search key value exists in
the leaf node, then add a file to the the leaf node, then add a file to the record and a bucket pointer if record and a bucket pointer if necessarynecessary
C. If a search-key value does not C. If a search-key value does not exist, then insert a new record into exist, then insert a new record into the file and make a new bucket and the file and make a new bucket and pointer if necessarypointer if necessary
Insertion (cont’d)Insertion (cont’d)
D. If there is no search key value D. If there is no search key value and there is no room in the node, and there is no room in the node, then split the node.then split the node.
E. Adjust the two leaves to a new E. Adjust the two leaves to a new greatest and least search-key valuegreatest and least search-key value
F. After a split, insert a new node to F. After a split, insert a new node to the parent and repeat the process of the parent and repeat the process of splitting when it gets too fullsplitting when it gets too full
B+-Tree DeletionB+-Tree Deletion
A. Look up the record and A. Look up the record and remove it from fileremove it from file
B. If no bucket was associated B. If no bucket was associated with its search-key value, with its search-key value, remove the search-key valueremove the search-key value
C. If the bucket is empty, C. If the bucket is empty, remove the search-key valueremove the search-key value
Deletion (cont’d)Deletion (cont’d)
D. If there are too few pointers D. If there are too few pointers in a node, transfer teh pointers in a node, transfer teh pointers to a sibling node, then delete itto a sibling node, then delete it
E. If transferring pointers gives a E. If transferring pointers gives a node to many pointers, node to many pointers, redistribute the pointers. the redistribute the pointers. the parent of the two nodes, need to parent of the two nodes, need to change pointerschange pointers
B+-Tree File OrganizationB+-Tree File Organization
A. Leaf nodes store records instead A. Leaf nodes store records instead of pointers to recordsof pointers to records
B. Insertion and deletion happens B. Insertion and deletion happens the same waythe same way
C. When inserting, the system adds C. When inserting, the system adds the record to the block if there is the record to the block if there is enough space, otherwise it splits the enough space, otherwise it splits the blockblock
D. Any Split will propagate upward if D. Any Split will propagate upward if necessarynecessary
BibliographyBibliography
Sliberchatz, Abraham, Henry F. Sliberchatz, Abraham, Henry F. Korth, and S. Sudarshan Korth, and S. Sudarshan Database System Concepts 5th Database System Concepts 5th Ed. Boston: McGraw Hill, 2002. Ed. Boston: McGraw Hill, 2002. Ch 12Ch 12