data management and file structure

51
Data Management and File Structure

Upload: others

Post on 10-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Management and File Structure

Data Management and File Structure

Page 2: Data Management and File Structure

Topics

Insertion in a B-Tree

Deletion from a B-Tree

B+Trees

Multiple indexing

Hashing

Page 3: Data Management and File Structure

Insertion in a B-Tree

Start searching the leaf node to insert the new record

If the leaf node is full, split it into two nodes.

Add the smallest key in the new leaf to the internal node.

Update the tree if necessary

Page 4: Data Management and File Structure

Sample Data File

Page 5: Data Management and File Structure

B-Tree of the Sample Data File (N=2)

Page 6: Data Management and File Structure

Insert <8, B>

Page 7: Data Management and File Structure

Insert <15, X> and <16, T>

Page 8: Data Management and File Structure

Insert <15, X> and <16, T>

Page 9: Data Management and File Structure

Insert <4, Q>

Page 10: Data Management and File Structure

Insert <6, J>

Page 11: Data Management and File Structure

Insertion when the root node is full

If the root node is full and a new key is to be added to it,

split the root node into two nodes

Put last N keys in new node

Create a new node and put the middle key in it

Make the new node the new root. (old root and the new node

split from it will be its children)

Page 12: Data Management and File Structure

Split the Root

Page 13: Data Management and File Structure

Deletion from a B-Tree

• When two leaf nodes are merged, a key is removed from the

internal node.

• If after removing a key, the internal node has less than N

keys, it is merged with its neighboring internal node. (Except

the root)

• When only one leaf node is left in the tree, the root is

removed.

Page 14: Data Management and File Structure

Example: Delete 1

Page 15: Data Management and File Structure

Example, Delete 1

Page 16: Data Management and File Structure

Example, Delete 1

Page 17: Data Management and File Structure

B+Trees

B-Trees are used to find the location of a record in a data file

The index and data files are two separate files

B+Tree combines the data and index files in a single tree

Leaf nodes are used to store data records

Page 18: Data Management and File Structure

Sample Data

Page 19: Data Management and File Structure

Sample B+Tree (N=2)

Page 20: Data Management and File Structure

Exhaustive Reading in Index Files

Exhaustive reading from a B-Tree needs starting from the

root each time

In a B+Tree leaf nodes are connected by pointers

Exhaustive reading a B+Tree is as fast as exhaustive reading

of a sorted file without overflow area

Page 21: Data Management and File Structure

Exhaustive Reading

Page 22: Data Management and File Structure

Multiple Indexing

Page 23: Data Management and File Structure

Multiple Indexing

If a data file is searched using two or more attributes,

multiple indexes should be created for it.

Multiple indexes can be created using:

Linear index

B-trees

B+trees

Page 24: Data Management and File Structure

Multiple Indexes using Linear Indexing

Data file is in the form of a pile file.

Records are always added from the end of the data file.

For each search attribute, a linear index is created.

If the index files are large, we cannot load them into the

memory together.

Page 25: Data Management and File Structure

Sample Data

Student ID Student Name Department132 K CENG141 B CENG155 C ECE176 D CENG162 A ECE134 E IE145 S IE112 W CENG114 T CENG125 H ECE133 U ECE147 P CENG118 M IE129 F CENG119 R IE

Location Key

7 112

8 114

12 118

14 119

9 125

13 129

0 132

10 133

5 134

1 141

6 145

11 147

2 155

4 162

3 176

Location Key

4 A

1 B

2 C

3 D

5 E

13 F

9 H

0 K

12 M

11 P

14 R

6 S

8 T

10 U

7 W

Page 26: Data Management and File Structure

Multiple Indexes using B-Trees

Data is in a pile file.

The record locations are at the leaf nodes of the index files.

For each search attribute a B-tree is created.

B-trees can be large. Only first two levels of the B-trees are

loaded into the memory and the rest are read from files.

Page 27: Data Management and File Structure

Multiple Indexes using B+Trees

A B+tree is created for the first (most important) search

attribute.

The records are in the leaf nodes of the B+tree.

For the second and third, .. search attributes, B-trees are

created.

B-trees have the location of the records in the B+tree

Page 28: Data Management and File Structure
Page 29: Data Management and File Structure

Hashing

Page 30: Data Management and File Structure

Hashing

Motivation: The number of file access in an indexed file is as

many as the tree height (3 or 4 for example)

Hashing methods provide a quick access to the records (1 or

2 file access)

Page 31: Data Management and File Structure

Definitions

Hash function: A function that returns the location of a record

given its key value.

Example: f(25)=1, f(1)=3

Page 32: Data Management and File Structure

Hash Function

Hash functions do not use any list or index.

Therefore, hash functions include no file access.

After finding the record location using a hash function, we go

to the file and read the record.

Page 33: Data Management and File Structure

How Do Hash Functions Work?

Hash functions get a key value and find the record location by

doing some arithmetic on it.

Generally hash functions find the remainder of the key value

by a constant N, then multiply it by a constant like a, and add

another constant like b

E.g. Hash(key) = (key MOD N)*a+b

Page 34: Data Management and File Structure

Definition

Hash table: The data file having the records is called a hash

table.

Hash tables are created using the location values returned

from the hash functions.

Page 35: Data Management and File Structure

Creating Hash Table

Compute the location of the record using hash function.

Put the record at the position returned from the hash

function.

Page 36: Data Management and File Structure

Example Hash Table Use Key Mod 10 to create the hash table.

Data File Hash Table

Page 37: Data Management and File Structure

Collision Problem

The hash function may generate the same values for different

keys.

Example: Keys 12 and 32 generate same results with hash

function :: key mod 10

This is called the collision problem

Page 38: Data Management and File Structure

Solutions for the Collision Problem

1. Bucketing: Use buckets as large as n records at each hash

table entry

2. Chaining: Records with the same hash values are chained in

a linked list using an overflow area or dynamic links

Page 39: Data Management and File Structure

Bucketing

Each entry in the hash table is a bucket.

Therefore each entry can hold several records.

It is difficult to decide about the bucket size.

Large buckets are wasteful.

Page 40: Data Management and File Structure

Bucketing

Page 41: Data Management and File Structure

Chaining

If the number of records in collision is larger than the bucket

size, bucketing fails.

This problem is because the bucket size is fixed (static)

Dynamic buckets, which grow with the number of records in

collision, are possible.

Dynamic buckets are created using linked lists

Page 42: Data Management and File Structure

Chaining

Linked lists in chaining are created using:

Dynamic memory allocation

Using an overflow area

Page 43: Data Management and File Structure

Dynamic Memory Allocation for

Chaining

0

1

2

3

4

D-1

...

...

...

Page 44: Data Management and File Structure

Chaining using Overflow Area

Page 45: Data Management and File Structure

Analyzing Bucketing

Bucket size (n) affects the amount of file I/O

If the bucket is larger than a block, reading a bucket requires

more than one file access

Large bucket size means more I/O to find a record.

Page 46: Data Management and File Structure

Analyzing Chaining

The chain length is important in I/O speed.

As far as possible, we should keep the chain lengths short.

The performance of the Hashing method depends on

choosing a good hash function

Page 47: Data Management and File Structure

Combining Bucketing and Chaining

Bucketing can be used with chaining for better performance.

If a bucket is the same size of a block, file I/O operations will

be more efficient (the unit of I/O operation is a block)

The buckets are connected using linked lists if collisions

happens.

Page 48: Data Management and File Structure

Sample DataStudent ID Student Name Department

132 A CENG141 B CENG155 C ECE176 D CENG162 A ECE134 E IE145 H IE112 B CENG114 T CENG125 H ECE133 U ECE147 P CENG118 M IE129 F CENG119 R IE

Page 49: Data Management and File Structure

Bucket Size and Hash Function

For this example we used

Student ID as key value

Key MOD 10 as hash function

Bucket size = 2

Page 50: Data Management and File Structure

Hash Table141 B CENG

132 A CENG162 A ECE133 U ECE

134 E IE114 T CENG155 C ECE145 H IE176 D CENG

147 P CENG

118 M IE

129 F CENG119 R IE

112 B CENG

125 H ECE

Page 51: Data Management and File Structure

Questions?

51 3/2/2020Roya Choupani