ch-2 adbms-tree structured indexing
DESCRIPTION
CH-2 ADBMS-Tree Structured IndexingTRANSCRIPT
1
Advanced Topics in DBMS
Ch-2: Tree Structured Indexing
By
Syed khutubddin Ahmed
Assistant Professor
Dept. of MCA
Reva Institute of Technology & mgmt.
Syed Khutubuddin, Assistant Prof,
REVA ITM
REMEMBER
Two types of Index Data Structures:
1) Hash based Indexing
2) Tree Based Indexing
Syed Khutubuddin, Assistant Prof,
REVA ITM 2
Index Data Structure
2
• The data entries are arranged in sorted order by search key value.
• and a hierarchical search data structure is maintained.
Syed Khutubuddin, Assistant Prof,
REVA ITM 3
What is Tree-Based Indexing:
Syed Khutubuddin, Assistant Prof,
REVA ITM 4
Tree Structured index
3
• index storage techniques uses 3 alternatives for data entries k*:
– Data record with key value k
– <k, rid of data record with search key value k>
– <k, list of rids of data records with search key k>
• REMEMBER
• Tree-structured indexing techniques support both range searches and equality searches.
Syed Khutubuddin, Assistant Prof,
REVA ITM 5
Tree Structured indexing
• Two techniques available in tree structured indexing:
1. ISAM (indexed sequential access method)
2. B+ Trees Both supports effective range searches
Syed Khutubuddin, Assistant Prof,
REVA ITM 6
Tree Structured indexing
4
ISAM
– it is static index structure that is effective when the file is not frequently updated.
– This method is not suitable for a file that grows and shrinks a lot.
B + Trees A dynamic structure that adjusts to changes in the file gracefully.
Most widely used index structure.
because it adjusts well to changes
Supports equality search and range search
Syed Khutubuddin, Assistant Prof,
REVA ITM 7
Take an example of Range Searches
• ``Find all students with gpa > 3.0’’
– What is the Solution:
– If data is in sorted file, do binary search to find first such student, then scan to find others.
– Remember Cost of binary search can be quite high if data is more as we are working on original data.
• Simple idea: Create an `index’ file.
Syed Khutubuddin, Assistant Prof,
REVA ITM 8
Motivation for tree indexes
5
• Index file may still be quite large. But we can apply the idea repeatedly!
* Leaf pages contain data entries.
P 0
K 1 P
1 K 2 P
2 K
m P m
index entry
Non-leaf
Pages
Pages
Overflow page
Primary pages
Leaf
ISAM
Comments on ISAM File creation:
– Leaf (data) pages allocated sequentially,
– sorted by search key;
– then index pages allocated,
– then space for overflow pages.
• Index entries: <search key value, page id>;
6
Non-leaf Pages
Pages Overflow
page Primary pages
Leaf
ISAM
• Data entries of the ISAM index are in the leaf of the tree
• and additional overflow pages chained to some leaf pages.
ISAM structure is static (except for overflow pages as they will be very few)
ISAM (Index Sequential Access method)
Each tree node is disk page.
When a file is created all leaf pages are allocated sequentially and sorted on the search key value.
The non leaf level pages are then allocated.
If there are several inserts to the file (but is there is no space ) then additional pages are needed because the index is static (these pages are called Overflow pages).
7
basic operations of insertion, deletion, and search are all quite straightforward
equality selection search start at the root node
• For Range Query the starting point in the data (or leaf) level is determined similarly, and data pages are then retrieved sequentially.
Syed Khutubuddin, Assistant Prof,
REVA ITM 13
• For inserts and deletes search the page and then insert it or delete it with overflow pages added if necessary.
• assume that each leaf page can contain two entries.
Syed Khutubuddin, Assistant Prof,
REVA ITM 14
8
• Let us insert the value 23 that is done by adding an overflow page and putting 23* in the overflow page.
• Chains of overflow pages can easily develop.
• For instance, inserting 48*, 41 *, and 42* leads to an overflow chain of two pages.
Syed Khutubuddin, Assistant Prof,
REVA ITM 15
• The deletion of an entry k* is handled by simply removing the entry.
• If this entry is on an overflow page and the overflow page becomes empty, the page can be removed.
• If the entry is on a primary page and deletion makes the primary page empty, the simplest approach is to simply leave the empty primary page as it is; it serves as a placeholder for future insertions.
Syed Khutubuddin, Assistant Prof,
REVA ITM 16
ISAM
9
• once the ISAM file is created, inserts and deletes affect only the contents of leaf pages.
• It does not effect the Non leaf pages. As it is fixed.
• In comparison to B+ trees the non leaf pages are not fixed. This is advantage of ISAM over B+ tree. At the same time it has a disadvantage of being static.
Syed Khutubuddin, Assistant Prof,
REVA ITM 17
Overflow pages, Locking Considerations
• A static structure such as the ISAM index suffers from the problem that long overflow chains can develop as the file grows, leading to poor performance.
• This problem motivated the development of more flexible, dynamic structures that adjust gracefully to inserts and deletes.
Syed Khutubuddin, Assistant Prof,
REVA ITM 18
B+ trees: Dynamic Index Structure
10
• Problems with ISAM:
• Long overflow pages leads to poor performance
Characteristics:
insertion:
Deletion:
Searching: just needs traversal from root to node
Height of the tree: rarely more than 3 to 4
Syed Khutubuddin, Assistant Prof,
REVA ITM 19
B+ trees
} In both operations
tree is balanced
• Format of the Tree node:
• M index entries contains m+1 pointers
P 0
K 1 P
1 K 2 P
2 K
m P m
index entry
11
• Search Operation on B+ trees:
* ptr value pointed by a pointer
&(value) to denote address value.
• For search :
– Assume no Duplicate entries
– No same keys
Syed Khutubuddin, Assistant Prof,
REVA ITM 21
B+ trees Search Operation
Syed Khutubuddin, Assistant Prof,
REVA ITM 22
B+ trees- Search
Click the image to watch video
12
Syed Khutubuddin, Assistant Prof,
REVA ITM 23
B+ trees – INSERT OPERATION (insert -8)
Syed Khutubuddin, Assistant Prof,
REVA ITM 24
• Insert 8
•
13
Syed Khutubuddin, Assistant Prof,
REVA ITM 25
Insert 8 using REDISTRIBUTION CONCEPT
Syed Khutubuddin, Assistant Prof,
REVA ITM 26
DELETE: Deleting entry 19 & 20 from Fig-1
FIG-1
FIG-2
14
Syed Khutubuddin, Assistant Prof,
REVA ITM 27
DELETE Entry 24
Syed Khutubuddin, Assistant Prof,
REVA ITM 28
B+ tree in Practice
Key Compression:
• If search key values are very long (for instance,
the name Devarakonda Venkataramana
Sathyanarayana Seshasayee Yellamanchali
Murthy. not many index entries will fit on a
page and the height of the tree is large.
• search key values in index entries are used only
to direct traffic to the appropriate leaf
• Solution:
it is sufficient to store the abbreviated forms 'Da'
and 'De for search key values 'David Smith' and
'Devarakonda
15
Syed Khutubuddin, Assistant Prof,
REVA ITM 29
B+ tree in Practice
Bulk Loading of B+ Trees:
Two way of lading data in B+ trees:
• First: tree is already available just insert keys
in appropriate places.
• Second: Tree is not available create from the
beginning.
Syed Khutubuddin, Assistant Prof,
REVA ITM 30
B+ tree in Practice
• Example:
16
Syed Khutubuddin, Assistant Prof,
REVA ITM 31
B+ tree in Practice
• Example:
Syed Khutubuddin, Assistant Prof,
REVA ITM 32
B+ tree in Practice
• In Sybase ASE, depending on the concurrency
control scheme being used for the index, the
deleted row is removed (with merging if the page
occupancy goes below threshold) or simply
marked as deleted; a garbage collection scheme
is used to recover space.
• Oracle 8, deletions are handled by marking the
row as deleted. to reclaim the space occupied by
deleted records, we can rebuild the index online
• DB2 and SQL Server remove deleted records
and merge pages when occupancy goes below
threshold.
17
Syed Khutubuddin, Assistant Prof,
REVA ITM 33
END OF UNIT-2