indexing and b+-trees by kenneth cheung cs 157b tr 07:30-08:45 professor lee

31
Indexing and B+- Indexing and B+- Trees Trees By Kenneth Cheung By Kenneth Cheung CS 157B TR 07:30-08:45 CS 157B TR 07:30-08:45 Professor Lee Professor Lee

Upload: sabrina-jacobs

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Factors of Indices 1. Access type 1. Access type 2. Access Time 2. Access Time 3. Insertion time 3. Insertion time 4. Deletion time 4. Deletion time 5. Space overhead 5. Space overhead

TRANSCRIPT

Page 1: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Indexing and B+-TreesIndexing and B+-Trees

By Kenneth CheungBy Kenneth CheungCS 157B TR 07:30-08:45CS 157B TR 07:30-08:45

Professor LeeProfessor Lee

Page 2: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Introduction to IndexingIntroduction to Indexing

Goal: to make it easier to look Goal: to make it easier to look up dataup data

Do by saving the data in a Do by saving the data in a sorted, compressed versionsorted, compressed version

Searching and insertion will be Searching and insertion will be easiereasier

Page 3: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Factors of IndicesFactors of Indices

1. Access type1. Access type 2. Access Time2. Access Time 3. Insertion time3. Insertion time 4. Deletion time4. Deletion time 5. Space overhead5. Space overhead

Page 4: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Clustering IndexClustering Index

an index whose search key also an index whose search key also defines the sequential order of defines the sequential order of the filethe file

Page 5: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Index-sequential filesIndex-sequential files

files ordered sequentially on a files ordered sequentially on a search keysearch key

Page 6: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Index RecordIndex Record

(aka index entry)- holds the (aka index entry)- holds the search-key value and pointers to search-key value and pointers to the records with the valuethe records with the value

Page 7: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

PointerPointer

identifies disk block or offset to identifies disk block or offset to disk blockdisk block

Page 8: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Dense IndexDense Index

a record appears for every a record appears for every search key value. Records are search key value. Records are stored in the same search-keystored in the same search-key

faster access time, but higher faster access time, but higher space overheadspace overhead

Page 9: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Sparse IndexSparse Index

an index record appears on an index record appears on some search-key values. To find some search-key values. To find a record, the system finds the a record, the system finds the largest search key value that is largest search key value that is less than or equal to the given less than or equal to the given search-key value then it moves search-key value then it moves up to finds it if it is notup to finds it if it is not

lower space overhead, but lower space overhead, but higher access timehigher access time

Page 10: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Larger DatabasesLarger Databases

Make a sparse index on a Make a sparse index on a clustering index, using 2 levels clustering index, using 2 levels of indicesof indices

Multilevel indices search faster Multilevel indices search faster than a binary searchthan a binary search

Page 11: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Index Update (Insertion)Index Update (Insertion)

A. Look up search keyA. Look up search key B. If the index record stores all B. If the index record stores all

pointers with the same index pointers with the same index value, then add a new pointer to value, then add a new pointer to the index recordthe index record

C. Otherwise, the index stores C. Otherwise, the index stores the first pointer to the index the first pointer to the index valuevalue

Page 12: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Index update- (Insertion to Index update- (Insertion to Sparse Indices)Sparse Indices) For sparse indices, if the system For sparse indices, if the system

makes a new block, then it must makes a new block, then it must add the first search-key value to add the first search-key value to the new index the new index

if the value has the least search if the value has the least search key value in the block, the index key value in the block, the index record is updated pointing to the record is updated pointing to the blockblock

Page 13: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

DeletionDeletion

A. Look up recordA. Look up record B. If it was a dense index and B. If it was a dense index and

the record deleted was the only the record deleted was the only one with the search key, then one with the search key, then delete the key form the indexdelete the key form the index

C. If the record stores pointers C. If the record stores pointers to all records, then the pointer to to all records, then the pointer to the deleted record is removedthe deleted record is removed

Page 14: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Deletion (cont’d)Deletion (cont’d)

D. If the record stores the D. If the record stores the pointer to the first record and the pointer to the first record and the first record is deleted, then the first record is deleted, then the pointer moves to the following pointer moves to the following recordrecord

E. If the index is sparse and E. If the index is sparse and the index does not contain the the index does not contain the search-key value, then the index search-key value, then the index remains the same.remains the same.

Page 15: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Deletion (cont’d)Deletion (cont’d)

F. If deleted record had the only F. If deleted record had the only search key, then the system search key, then the system replaces the corresponding replaces the corresponding index search record for the next index search record for the next search key value. If the next search key value. If the next search key value is an index search key value is an index entry, then the entry is deleted entry, then the entry is deleted instead of being replacedinstead of being replaced

Page 16: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Deletion (cont’d)Deletion (cont’d)

G. If the index record for the G. If the index record for the search-key point to the record search-key point to the record being deleted, the pointer goes being deleted, the pointer goes to the next record with the same to the next record with the same search key value.search key value.

Page 17: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Secondary IndicesSecondary Indices

A. Secondary Indices are dense A. Secondary Indices are dense and points to all recordsand points to all records

B. Stored sequentially and may B. Stored sequentially and may not have non-candidate keysnot have non-candidate keys

C. If a multi-indexed database is C. If a multi-indexed database is updated, then every index must updated, then every index must be updated alsobe updated also

Page 18: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

B+-TreesB+-Trees

An alternative to An alternative to Binary Search TreesBinary Search Trees

Page 19: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Conditions of a B+-TreeConditions of a B+-Tree

A. Search-key values are K1, A. Search-key values are K1, K2...Kn-1K2...Kn-1

B. Pointers P1, P2...PnB. Pointers P1, P2...Pn C. Search key values are kept in C. Search key values are kept in

sorted ordersorted order

Page 20: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Conditions (cont’d)Conditions (cont’d)

D. Pointer P points to a file D. Pointer P points to a file record with a search-key value record with a search-key value of K or a bucket of more of K or a bucket of more pointerspointers

E. Each node has more than 2 E. Each node has more than 2 pointers (binary tree has 2)pointers (binary tree has 2)

F. Stores redundant search-key F. Stores redundant search-key valuesvalues

Page 21: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

BucketsBuckets

Buckets are used only if the Buckets are used only if the search key value does not form search key value does not form a candidate key and if the file is a candidate key and if the file is not stored in search key ordernot stored in search key order

Page 22: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

LeavesLeaves

A. Each leaf holds up to n-1 A. Each leaf holds up to n-1 valuesvalues

B. Pointers P chain together B. Pointers P chain together leaf nodes in search key orderleaf nodes in search key order

C. Non-leaf nodes are sparse C. Non-leaf nodes are sparse multilevel indicesmultilevel indices

Page 23: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Leaves (cont’d)Leaves (cont’d)

D. Non-leaf nodes may hold up D. Non-leaf nodes may hold up to n/2 ceil to n pointersto n/2 ceil to n pointers

E. Number of pointers in a node E. Number of pointers in a node is a fan out of a nodeis a fan out of a node

F. The root must hold at 2 to n/2 F. The root must hold at 2 to n/2 pointerspointers

Page 24: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Queries for finding VQueries for finding V

A. To find search-key value V, A. To find search-key value V, start at rootstart at root

B. It looks for the smallest B. It looks for the smallest search-key greater than Vsearch-key greater than V

C. If it finds a K, then the pointer C. If it finds a K, then the pointer P goes to another nodeP goes to another node

Page 25: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Queries (cont’d)Queries (cont’d)

D. The process repeats going D. The process repeats going down the tree by finding a down the tree by finding a search key value K that equals search key value K that equals V. V.

E. If there is no K that equals V E. If there is no K that equals V at the leaf, then no such record at the leaf, then no such record existsexists

Page 26: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

B+-tree InsertionB+-tree Insertion

A. First look upA. First look up B. If the search key value exists in B. If the search key value exists in

the leaf node, then add a file to the the leaf node, then add a file to the record and a bucket pointer if record and a bucket pointer if necessarynecessary

C. If a search-key value does not C. If a search-key value does not exist, then insert a new record into exist, then insert a new record into the file and make a new bucket and the file and make a new bucket and pointer if necessarypointer if necessary

Page 27: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Insertion (cont’d)Insertion (cont’d)

D. If there is no search key value D. If there is no search key value and there is no room in the node, and there is no room in the node, then split the node.then split the node.

E. Adjust the two leaves to a new E. Adjust the two leaves to a new greatest and least search-key valuegreatest and least search-key value

F. After a split, insert a new node to F. After a split, insert a new node to the parent and repeat the process of the parent and repeat the process of splitting when it gets too fullsplitting when it gets too full

Page 28: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

B+-Tree DeletionB+-Tree Deletion

A. Look up the record and A. Look up the record and remove it from fileremove it from file

B. If no bucket was associated B. If no bucket was associated with its search-key value, with its search-key value, remove the search-key valueremove the search-key value

C. If the bucket is empty, C. If the bucket is empty, remove the search-key valueremove the search-key value

Page 29: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Deletion (cont’d)Deletion (cont’d)

D. If there are too few pointers D. If there are too few pointers in a node, transfer teh pointers in a node, transfer teh pointers to a sibling node, then delete itto a sibling node, then delete it

E. If transferring pointers gives a E. If transferring pointers gives a node to many pointers, node to many pointers, redistribute the pointers. the redistribute the pointers. the parent of the two nodes, need to parent of the two nodes, need to change pointerschange pointers

Page 30: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

B+-Tree File OrganizationB+-Tree File Organization

A. Leaf nodes store records instead A. Leaf nodes store records instead of pointers to recordsof pointers to records

B. Insertion and deletion happens B. Insertion and deletion happens the same waythe same way

C. When inserting, the system adds C. When inserting, the system adds the record to the block if there is the record to the block if there is enough space, otherwise it splits the enough space, otherwise it splits the blockblock

D. Any Split will propagate upward if D. Any Split will propagate upward if necessarynecessary

Page 31: Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

BibliographyBibliography

Sliberchatz, Abraham, Henry F. Sliberchatz, Abraham, Henry F. Korth, and S. Sudarshan Korth, and S. Sudarshan Database System Concepts 5th Database System Concepts 5th Ed. Boston: McGraw Hill, 2002. Ed. Boston: McGraw Hill, 2002. Ch 12Ch 12