ch-2 adbms-tree structured indexing

17
1 Advanced Topics in DBMS Ch-2: Tree Structured Indexing By Syed khutubddin Ahmed Assistant Professor Dept. of MCA Reva Institute of Technology & mgmt. Syed Khutubuddin, Assistant Prof, REVA ITM REMEMBER Two types of Index Data Structures: 1) Hash based Indexing 2) Tree Based Indexing Syed Khutubuddin, Assistant Prof, REVA ITM 2 Index Data Structure

Upload: dileep-dv

Post on 27-Dec-2015

34 views

Category:

Documents


4 download

DESCRIPTION

CH-2 ADBMS-Tree Structured Indexing

TRANSCRIPT

Page 1: CH-2 ADBMS-Tree Structured Indexing

1

Advanced Topics in DBMS

Ch-2: Tree Structured Indexing

By

Syed khutubddin Ahmed

Assistant Professor

Dept. of MCA

Reva Institute of Technology & mgmt.

Syed Khutubuddin, Assistant Prof,

REVA ITM

REMEMBER

Two types of Index Data Structures:

1) Hash based Indexing

2) Tree Based Indexing

Syed Khutubuddin, Assistant Prof,

REVA ITM 2

Index Data Structure

Page 2: CH-2 ADBMS-Tree Structured Indexing

2

• The data entries are arranged in sorted order by search key value.

• and a hierarchical search data structure is maintained.

Syed Khutubuddin, Assistant Prof,

REVA ITM 3

What is Tree-Based Indexing:

Syed Khutubuddin, Assistant Prof,

REVA ITM 4

Tree Structured index

Page 3: CH-2 ADBMS-Tree Structured Indexing

3

• index storage techniques uses 3 alternatives for data entries k*:

– Data record with key value k

– <k, rid of data record with search key value k>

– <k, list of rids of data records with search key k>

• REMEMBER

• Tree-structured indexing techniques support both range searches and equality searches.

Syed Khutubuddin, Assistant Prof,

REVA ITM 5

Tree Structured indexing

• Two techniques available in tree structured indexing:

1. ISAM (indexed sequential access method)

2. B+ Trees Both supports effective range searches

Syed Khutubuddin, Assistant Prof,

REVA ITM 6

Tree Structured indexing

Page 4: CH-2 ADBMS-Tree Structured Indexing

4

ISAM

– it is static index structure that is effective when the file is not frequently updated.

– This method is not suitable for a file that grows and shrinks a lot.

B + Trees A dynamic structure that adjusts to changes in the file gracefully.

Most widely used index structure.

because it adjusts well to changes

Supports equality search and range search

Syed Khutubuddin, Assistant Prof,

REVA ITM 7

Take an example of Range Searches

• ``Find all students with gpa > 3.0’’

– What is the Solution:

– If data is in sorted file, do binary search to find first such student, then scan to find others.

– Remember Cost of binary search can be quite high if data is more as we are working on original data.

• Simple idea: Create an `index’ file.

Syed Khutubuddin, Assistant Prof,

REVA ITM 8

Motivation for tree indexes

Page 5: CH-2 ADBMS-Tree Structured Indexing

5

• Index file may still be quite large. But we can apply the idea repeatedly!

* Leaf pages contain data entries.

P 0

K 1 P

1 K 2 P

2 K

m P m

index entry

Non-leaf

Pages

Pages

Overflow page

Primary pages

Leaf

ISAM

Comments on ISAM File creation:

– Leaf (data) pages allocated sequentially,

– sorted by search key;

– then index pages allocated,

– then space for overflow pages.

• Index entries: <search key value, page id>;

Page 6: CH-2 ADBMS-Tree Structured Indexing

6

Non-leaf Pages

Pages Overflow

page Primary pages

Leaf

ISAM

• Data entries of the ISAM index are in the leaf of the tree

• and additional overflow pages chained to some leaf pages.

ISAM structure is static (except for overflow pages as they will be very few)

ISAM (Index Sequential Access method)

Each tree node is disk page.

When a file is created all leaf pages are allocated sequentially and sorted on the search key value.

The non leaf level pages are then allocated.

If there are several inserts to the file (but is there is no space ) then additional pages are needed because the index is static (these pages are called Overflow pages).

Page 7: CH-2 ADBMS-Tree Structured Indexing

7

basic operations of insertion, deletion, and search are all quite straightforward

equality selection search start at the root node

• For Range Query the starting point in the data (or leaf) level is determined similarly, and data pages are then retrieved sequentially.

Syed Khutubuddin, Assistant Prof,

REVA ITM 13

• For inserts and deletes search the page and then insert it or delete it with overflow pages added if necessary.

• assume that each leaf page can contain two entries.

Syed Khutubuddin, Assistant Prof,

REVA ITM 14

Page 8: CH-2 ADBMS-Tree Structured Indexing

8

• Let us insert the value 23 that is done by adding an overflow page and putting 23* in the overflow page.

• Chains of overflow pages can easily develop.

• For instance, inserting 48*, 41 *, and 42* leads to an overflow chain of two pages.

Syed Khutubuddin, Assistant Prof,

REVA ITM 15

• The deletion of an entry k* is handled by simply removing the entry.

• If this entry is on an overflow page and the overflow page becomes empty, the page can be removed.

• If the entry is on a primary page and deletion makes the primary page empty, the simplest approach is to simply leave the empty primary page as it is; it serves as a placeholder for future insertions.

Syed Khutubuddin, Assistant Prof,

REVA ITM 16

ISAM

Page 9: CH-2 ADBMS-Tree Structured Indexing

9

• once the ISAM file is created, inserts and deletes affect only the contents of leaf pages.

• It does not effect the Non leaf pages. As it is fixed.

• In comparison to B+ trees the non leaf pages are not fixed. This is advantage of ISAM over B+ tree. At the same time it has a disadvantage of being static.

Syed Khutubuddin, Assistant Prof,

REVA ITM 17

Overflow pages, Locking Considerations

• A static structure such as the ISAM index suffers from the problem that long overflow chains can develop as the file grows, leading to poor performance.

• This problem motivated the development of more flexible, dynamic structures that adjust gracefully to inserts and deletes.

Syed Khutubuddin, Assistant Prof,

REVA ITM 18

B+ trees: Dynamic Index Structure

Page 10: CH-2 ADBMS-Tree Structured Indexing

10

• Problems with ISAM:

• Long overflow pages leads to poor performance

Characteristics:

insertion:

Deletion:

Searching: just needs traversal from root to node

Height of the tree: rarely more than 3 to 4

Syed Khutubuddin, Assistant Prof,

REVA ITM 19

B+ trees

} In both operations

tree is balanced

• Format of the Tree node:

• M index entries contains m+1 pointers

P 0

K 1 P

1 K 2 P

2 K

m P m

index entry

Page 11: CH-2 ADBMS-Tree Structured Indexing

11

• Search Operation on B+ trees:

* ptr value pointed by a pointer

&(value) to denote address value.

• For search :

– Assume no Duplicate entries

– No same keys

Syed Khutubuddin, Assistant Prof,

REVA ITM 21

B+ trees Search Operation

Syed Khutubuddin, Assistant Prof,

REVA ITM 22

B+ trees- Search

Click the image to watch video

Page 12: CH-2 ADBMS-Tree Structured Indexing

12

Syed Khutubuddin, Assistant Prof,

REVA ITM 23

B+ trees – INSERT OPERATION (insert -8)

Syed Khutubuddin, Assistant Prof,

REVA ITM 24

• Insert 8

Page 13: CH-2 ADBMS-Tree Structured Indexing

13

Syed Khutubuddin, Assistant Prof,

REVA ITM 25

Insert 8 using REDISTRIBUTION CONCEPT

Syed Khutubuddin, Assistant Prof,

REVA ITM 26

DELETE: Deleting entry 19 & 20 from Fig-1

FIG-1

FIG-2

Page 14: CH-2 ADBMS-Tree Structured Indexing

14

Syed Khutubuddin, Assistant Prof,

REVA ITM 27

DELETE Entry 24

Syed Khutubuddin, Assistant Prof,

REVA ITM 28

B+ tree in Practice

Key Compression:

• If search key values are very long (for instance,

the name Devarakonda Venkataramana

Sathyanarayana Seshasayee Yellamanchali

Murthy. not many index entries will fit on a

page and the height of the tree is large.

• search key values in index entries are used only

to direct traffic to the appropriate leaf

• Solution:

it is sufficient to store the abbreviated forms 'Da'

and 'De for search key values 'David Smith' and

'Devarakonda

Page 15: CH-2 ADBMS-Tree Structured Indexing

15

Syed Khutubuddin, Assistant Prof,

REVA ITM 29

B+ tree in Practice

Bulk Loading of B+ Trees:

Two way of lading data in B+ trees:

• First: tree is already available just insert keys

in appropriate places.

• Second: Tree is not available create from the

beginning.

Syed Khutubuddin, Assistant Prof,

REVA ITM 30

B+ tree in Practice

• Example:

Page 16: CH-2 ADBMS-Tree Structured Indexing

16

Syed Khutubuddin, Assistant Prof,

REVA ITM 31

B+ tree in Practice

• Example:

Syed Khutubuddin, Assistant Prof,

REVA ITM 32

B+ tree in Practice

• In Sybase ASE, depending on the concurrency

control scheme being used for the index, the

deleted row is removed (with merging if the page

occupancy goes below threshold) or simply

marked as deleted; a garbage collection scheme

is used to recover space.

• Oracle 8, deletions are handled by marking the

row as deleted. to reclaim the space occupied by

deleted records, we can rebuild the index online

• DB2 and SQL Server remove deleted records

and merge pages when occupancy goes below

threshold.

Page 17: CH-2 ADBMS-Tree Structured Indexing

17

Syed Khutubuddin, Assistant Prof,

REVA ITM 33

END OF UNIT-2