storage and file structure
DESCRIPTION
Storage and File Structure. 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing. 1. Classification of Physical Storage Media. Criteria: speed with which data can be accessed cost per unit of data reliability - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/1.jpg)
1
Storage and File Structure
1. Classification of physical storage media
2. Storage access
3. File organization
4. Indexing
5. B+ - trees
6. Static hashing
![Page 2: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/2.jpg)
2
1. Classification of Physical Storage MediaCriteria:
speed with which data can be accessed
cost per unit of data
reliability
data loss on power failure or system crash
physical failure of the storage device
Volatile storage: loses contents when power is switched off
Non-volatile storage: contents persist even when power is switched off.
Includes secondary and tertiary storage, as well as battery-backed up main memory
![Page 3: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/3.jpg)
3
Cache: the fastest and most costly form of storagevolatilemanaged by the hardware /operating system
Main memory: sometimes referred to as core memoryvolatile -contents of main memory are usually lost if a power failure or system crash occursgeneral-purpose machine instructions operates on
data resident in main memoryfast access, but generally too small to store the
entire databaseFlash memory:non-volatile memory (data survives power failure)
reads are roughly as fast as main memorycan support a only limited number of write/erase cycles
![Page 4: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/4.jpg)
4
Magnetic-disk storage
primary medium for the long-term storage of data
typically stores entire database
data must be moved from disk to main memory for access and written back for storage
direct-access- possible to read data on disk in any order
usually survives power failures and system crashes
disk failure can destroy data, but is much less frequent than system crashes
Optical storage: non-volatile
the most popular: CD-ROM
write-once, read-many (WORM) optical disks:used for archival storage
![Page 5: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/5.jpg)
5
Tape storage:
non-volatile
used primarily for
backup (to recover from disk failure)
archive data
sequential-access - much slower than direct access,disk
very high capacity (5 GB is common)
tapes can be removed from drive:--> storage costs much cheaper than disk
![Page 6: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/6.jpg)
6
Storage hierarchy
cache
Main memory
Flash memory
Magnetic disk
Optical disk
Magnetic tape
![Page 7: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/7.jpg)
7
Primary storage: fastest media but volatile
CACHE, MAIN MEMORY
Secondary storage: moderately fast access time, non-volatile
also called on-line storage
FLASH MEMORY, MAGNETIC DISKS
Tertiary storage: slow access time, non-volatile
also called off-line storage
MAGNETIC TAPES, OPTICAL STORAGE
![Page 8: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/8.jpg)
8
Magnetic disks:
Read-write head: device positioned closed to the platter surface
reads or writes magnetically encoded information
Surface of platter is divided into circular tracks
Each track is divided into sectors
A sector is the smallest unit of data that can be read or written
Cylinder j consists of the j-th track of all the plattersHead-disk assemblies- multiple disk platters on a single spindle, with multiple heads (one per platter) mounted on a common armTo read/write a sector:
disk arm swings to position head on the right track
platter spins continually; data is read/written when sector comes under head
![Page 9: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/9.jpg)
9
spindleTrack t
Sector s
Cylinder c
platter
Arm assemblyarm
read-write head
![Page 10: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/10.jpg)
10
Disk subsystem
System Bus
Disk controller
disks
Interfaces between the computer system and the disk drive hardware
Accepts high-level commands to read or write a sector
Initiates actions such as moving the disk arm to the right track and actually reading or writing the data
![Page 11: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/11.jpg)
11
Performance measures of disks
Access time: the time it takes from when a read or write request is issued to when data transfer begins. Consists of:
seek time: time it takes to reposition the arm over the correct track. Average seek time is 1/3rd the
worst case seek time.
rotational latency: time it takes for the sector to be accessed to appear under the head. Average latency is 1/2 of the worst case latency.
Data-transfer rate: the rate at which data can be retrieved from or stored to the disk
Mean time to failure (MTTF): the average time the disk is expected to run continuously without any failure
![Page 12: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/12.jpg)
12
Optimisation of Disk-Block AccessBlock a contiguous sequence of sectors from a single track
data is transferred between disk and main memory in blockssizes range from 512 bytes to several KB
Disk-arm-scheduling algorithms order accesses to tracks so that disk arm movement is minimized (e. g. elevator algorithm )
File organization: optimise block access time by organizing the blocks to correspond to how data will be accessed. Stores related information on the same or nearby cylinders
Non-volatile write buffers: speed up disk writes by writing blocks to a non-volatile RAM buffer immediately. Controller than writes to disk whenever the disk has no other requests
Log disk: a disk devoted to writing a sequential log of block updates; this eliminates seek time. Used like nonvolatile RAM
![Page 13: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/13.jpg)
13
RAID = Redundant Arrays of Inexpensive Disks
it is a disk organization techniques that take advantage of utilizing large numbers of inexpensive, mas-market disks
originally a cost-effective alternative to large, expensive disks
today RAIDs are used for their higher reliability and bandwidth, rather than for economic reasons, hence I is interpreted as independent instead of inexpensive
The chance that some disk out of a set of N disks will fail is much higher than the chance that a specific single disk will fail.
For instance, a system with 100 disks, each with MTTF of 100,000 hours (approx. 11 years) will have a system MTTF of 1000 hours (approx. 41 days)
![Page 14: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/14.jpg)
14
Improvement of Reliability via Redundancy
Redundancy store extra information than can be used to rebuild information lost in a disk failure.
EX: Mirroring (Shadowing): duplicate every disk.
Logical disk consists of two physical disks
every write is carried out on both disks
if one disk in a pair fails, data is still available on the other
![Page 15: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/15.jpg)
15
Improvement in Performance via Parallelism
Two main goals of parallelism in a disk system:
1. Load balance multiple small accesses to increase throughput
2. Parallelize large accesses to reduce response time
Improve transfer rate by striping data across multiple disks:
1. Bit-level striping: split the bits of each byte across multiple disks
in an array of 8 disks, write bit j of each byte on disk j
each access can read data at 8 times the rate of a single disk
but seek/access time worse than for a single disk
2. Block-level striping: with n disks, block j of a file goes to disk (j mod n) + 1
![Page 16: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/16.jpg)
16
RAID levels
Schemes to provide redundancy at lower cost by using disk striping combined with parity bits
Different RAID organizations (RAID levels) have differing cost, performance and reliability characteristics
Level 0: striping at the level of blocks
non-redundant
used in high-performance applications where data loss is not critical
Level 1: mirrored disks
offers best write performance
popular for applications such as storing log files in a database system
![Page 17: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/17.jpg)
17
Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit striping
Level 3: Bit-Interleaved Parity: a single parity bit can be used for error correction, not just detection
When writing data, parity bit must also be computed and written
faster data transfer than with a single disk, but fewer I/Os per second since every disk has to participate in every I/O
subsumes Level 2 (provides all its benefits, at lower cost)
![Page 18: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/18.jpg)
18
Level 4: Block-Interleaved Parity
uses block-level striping
keeps a parity block on a separate disk for corresponding blocks from N other disks
provides higher I/O rates for independent block reads than Level 3 (bloc read goes to a single edisk, so blocks stored on different disks can be read in parallel
provides high transfer rates for reads of multiple blocks
however, parity block becomes a bottleneck for independent block writes since every block write also writes to parity disk
![Page 19: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/19.jpg)
19
Level 5: Block-Interleaved Distributed Parity
partions data and parity among all N + 1 disks, rather than storing data in N disks and parity in 1 disk
e.g. with 5 disks, parity block for nth set of blocks is stored on disk (n mod 5) + 1, with the data blocks stored on the other 4 disks
higher I/O rates than level 4. (block writes occur in parallel if the blocks and their parity blocks are on different disks).
Subsumes Level 4
Level 6: P + Q redundancy Scheme
similar to Level 5 but store extra redundant information to guard against multiple disk failures
better reliability than Level 5 at a higher cost
not used as widely
![Page 20: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/20.jpg)
20
Optical disks
Compact disk- read only memory (CD-ROM)
disks can be loaded into or removed from a drive
high storage capacity (500 MB)
high seek times and latency
lower data-transfer rates than magnetic disks
Digital Video Disk (DVD)- new optical format
hold 4.7 to 17 GB
WORM disk (write-once read-many)
can be written using the same drive from which they are read
high capacity and long lifetime
used for archival storage
WORM jukebox
![Page 21: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/21.jpg)
21
Magnetic Tapes
hold large volumes of data (5 GB usual)
currently the cheapest storage medium
very slow acces time in comparison to magnetical and optical disks
limited to sequential access
used mainly for backup, for storage of infrequently used information, as an off-line medium from transfering information from one system to another
tape jukeboxes used for very large capacity (terabype, 1012 to petabyte, 1015 storage)
![Page 22: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/22.jpg)
22
2. Storage Access
Block: A database file is partitioned into fixed-length storage units
Unit of both storage allocation and data transfer
DBMS seeks to minimize the number of block transfers between the disk and the memory: by keeping as many blocks as possible in the main memory
Buffer portion of the main memory available to store copies of disk blocks
Buffer Manager subsystem responsible for allocating buffer space in main memory
![Page 23: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/23.jpg)
23
Buffer Manager
Programs call on the BM when they need a block from disk
if it is already present in the buffer, the requesting program is given the address of the block in main memory
if the block is not in the buffer, the BM allocates space in the buffer for the block, replacing (throwing out) some other block, if required, to make space for the new block
the block that is thrown out is written back to disk only if it was modified since the most recent time it was
written to/fetched from the disk
once space is allocated in the buffer, the BM reads in the block from the disk to the buffer, and passes the address of the block in main memory to the requester
![Page 24: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/24.jpg)
24
Buffer -Replacement Policies
Most operating systems replace the block least recently used (LRU)
LRU - use past pattern of block references as a predictor of future references
Queries have well-defined access patterns (such as sequential scans), and a DBMS can use the information in a user’s query to predict future references
LRU can be a bad strategy for certain access patterns involving repeated scans of data
Mixed strategy with hints on replacement strategy provided by the query optimizer is preferable
![Page 25: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/25.jpg)
25
Pinned block memory block that is not allowed to be written back to disk
Toss-immediate strategy frees the space occupied by a block as soon as the final tuple of that block
has been processed
Most Recently Used (MRU) strategy: system must pin the block currently being processed. After the final tuple of that block has been processed, the block is un pinned, and it becomes the mru block.
Buffer manager (BM) can use statistical information regarding the probability that a request will reference a particular relation.
EX: the data dictionary is frequently accessed
Heuristic: keep data-dictionary blocks in main memory buffer
![Page 26: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/26.jpg)
26
3. File Organization
The database is stored as a collection of files.
Each file is a sequence of records.
A record is a sequence of fields.
![Page 27: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/27.jpg)
27
To delete the j record- alternatives:
1. Move records j+1, .., n to j, …, n-1
2. Move record n to j
3. Link all free records on a free list
The simplest approach:record size is fixed (n) --> store record j starting from byte n*(j -1)record access is simple but records may cross blocks
each file has records of one particular type only
different files are used for different relations
![Page 28: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/28.jpg)
28
Free listStore the address of the first record whose contents are deleted in
the file headerUse this first record to store the address of the second available
record, and so onCan think of these stored addresses as pointers since they “point”
to the location of a recordheader
Record 2
Record 3
Record 4
Record 5
Record 6
Record 7
Record 0
Record 1
Record 8
John A100 300
Marry A200 600
Arthur A300 700
John A1001 3000
Marry A106 500
Richard A1004 500
nill
![Page 29: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/29.jpg)
29
More space efficient representation: reuse space for normal attributes of free records to store pointers
(no pointers stored in in-use records)
Dangling pointers occur if we move or delete a record to which another record contains a pointer
That pointer no longer points to the desired record
Avoid moving or deleting records that are pointed to by other records.
Such records are pinned.
![Page 30: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/30.jpg)
30
Variable-Length Records
Arise in database systems in several ways:
storage of multiple record types in a file
record types that allow variable lengths for one or more fields
record types that allow repeating fields (used in some older data models)
Byte string representation
attach an end-of-record () control character to the end of each record
difficulty with deletion
difficulty with growth
![Page 31: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/31.jpg)
31
Header contains:
number of record entries
end of free space in the block
location and size of each record
Page structure:
#entries
End fr spFree space
![Page 32: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/32.jpg)
32
Records can be moved around within a page to keep them contiguous with no empty space between them;
entry in the header must then be updated
Pointers should not point directly to record - instead they should point to the entry for the record in header
Fixed-length representation:reserved space: can use fixed-length records of a known
maximum length
unused space in shorter records filled with a null or end-of-record symbol
pointers: the maximum record length is knowna variable-length record is represented by a list of fixed-length records, chained together via pointers
![Page 33: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/33.jpg)
33
Disadvantage to pointer structure:
space is wasted in all records, except the first in a chain
Solution is to allow two kinds of block in a file:
1. Anchor block: contains the first record of a chain
2. Overflow block: contains records other than those that are the first records of chains
![Page 34: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/34.jpg)
34
Organization of records in Files
Heap a record can be placed anywhere in the file where there is space
Sequential store records in sequential order, based on the value of the search key of each record
Hashing a hash function is computed on some attribute of each record
the result specifies in which block of the file the record should be placed
Clustering records of several different relations can be stored in the same file
related records are stored on the same block
![Page 35: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/35.jpg)
35
Sequential File Organization
Suitable for applications that require sequential processing of the entire file
The records in the file are ordered by a search-key
INSERTION: must locate the position in the file where the record is to be inserted:
if ther is free space, insert there
if no free space, insert the record in an overflow block
in either case, pointer chain must be updated
DELETION: use pointer chains
Need to reorganize the file from time to time to restore sequential order
![Page 36: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/36.jpg)
36
Brighton A217 750Downtown A101 540Downtown A110 400Linberg A215 200Perryton A123 100Perryton A130 500Redwood A213 700Roundhill A334 112
![Page 37: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/37.jpg)
37
Clustering File Organization
Simple file structure: stores each relation in a separate file
Alternative: store several relations in one file using a clustering file organization
Good for queries involving all relations
Bad for queries involving a single relation
Results in variable size records
![Page 38: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/38.jpg)
38
Data Dictionary Storage (System Catalog)
It stores metadata (data about data):
information about relations
names of relations, names and types of attributes, physical file organization structure, statistical data (e.g. number of tuples in each relation)
integrity constraints
view definitions
user and accounting information
information about indices
![Page 39: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/39.jpg)
39
Catalog structure: can use either
specialized data structures designed for efficient acces, OR
a set of relations, with existing system features used to ensure efficient access (preferred)
EXAMPLE.System-catalog-schema = (relation-name, number-of-attributes)
Attribute-schema = (attribute-name, relation-name, domain-type, position, length)
User-schema = (user-name, encrypted-password, group)
Index-schema = (index-name, relation-name, index-type, index-attributes)
View-schema = (view-name, definition)
![Page 40: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/40.jpg)
40
4.Indexing
Search-key
Indexing mechanisms used to speed up access to desired data
ex: author catalog in a library
search key (set of) attribute(s) used to look up records in a file
index file consists of records (index entries) of the form
pointer
Index files are typically much smaller than the original fileTwo basic kinds of indices:
ordered indices: search keys are stored in sorted order
hash indices: search keys are distributed uniformly across “buckets” using a “hash function”
![Page 41: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/41.jpg)
41
Index Evaluation metrics
Indexing techniques are evaluated on basis of:
• access types supported efficiently, e.g.
records with a specified value in an attribute
or records with an attribute value falling in a
specified range of values
• access time
• insertion time
• deletion time
• space overhead
![Page 42: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/42.jpg)
42
Ordered IndicesIndex entries are stored sorted on the sort key valuePrimary index: in a sequentially ordered file:
the index whose search key specifies the sequential order of the file
also called clustering index
the search key of a primary index is usually but not necessarily the primary key
Secondary index: an index whose search key specifies an order differrent from the sequential order of the file.
also called non-clustering index
Index-sequential file: an ordered sequential file with a primary index
![Page 43: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/43.jpg)
43
Dense Index Files:
index record appears for every search - key value in the file
Sparse Index Files:
• index records appear for some search-key values -->
• less space & maintenance overhead for insertions & deletions
• generally slower than dense index for locating records
• to locate a record with search-key value K we
1. Find index record with largest search-key value < K
2. Search file sequentially starting at the record to which the index record points
• good tradeoff: sparse index with an index entry for every block in file (the least search-key value in the block)
![Page 44: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/44.jpg)
44
Brighton A217 750Downtown A101 540Downtown A110 400Linberg A215 200Perryton A123 100Perryton A130 500Redwood A213 700Roundhill A334 112
BrightonDowntownLinbergPerrytonRedwoodRoundhill
Example of dense index: index record appears for every search-key value in the file
![Page 45: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/45.jpg)
45
Brighton A217 750Downtown A101 540Downtown A110 400Linberg A215 200Perryton A123 100Perryton A130 500Redwood A213 700Roundhill A334 112
Brighton
Linberg
Redwood
Example of sparse index: index records appear for some search-key values in the file
![Page 46: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/46.jpg)
46
Multilevel Index
•If primary index does not fit in memory: access becomes expensive
•to reduce number of disk accesses to index records:
treat primary index (inner index) kept on disk as a sequential file
construct a sparse index (outer index) on it
•if even outer index is too large to fit in main memory:
another level of index can be created , and so on
•indeces at all levels must be updated on insertion or deletion from the file
![Page 47: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/47.jpg)
47
...
Data block 0
Data block 1
Data block n-1
...
…
Inner index
...
Index block 0
Index block 1
Outer index
![Page 48: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/48.jpg)
48
Index Update: Deletion
If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index too.
Singel-level index deletion:
1. Dense indeces:
deletion of search-key is similar to file record deletion
2. Sparse indeces:
if an entry for the search key exists in the index:
it is deleted by replacing the entry in the index with the next search-key value in the file(in search-key order)
if the next search-key value already has an index entry, the entry is deleted instead of being replaced
![Page 49: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/49.jpg)
49
Index Update: Insertion
Single-level index insertion:
1. Perform a lookup using the search-key value appearing in the record to be inserted
2a. Dense indeces: if the search-key value does not appear in the index, insert it
2b. Sparse indeces: if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created. In this case, the first search-key value appearing in the new block is inserted into the index.
Multi-level index insertion:
(as well as deletion) algorithms are simple extensions of the single-level algorithms
![Page 50: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/50.jpg)
50
Secondary IndicesProblem: Find all the records whose values in a certain field
satisfy some condition 1. if field = search-key of the primary index: no problem 2. if field <> search-key of the primary index: -->secondary index
Examples:in the account database stored sequentially by account number, we may want to find all accounts in a particular branchas above, but we want to find all accounts with a specified balance or range of balances
Secondary index: an index record for each search-key value
index record points to a bucket that contains pointers to all the actual records with that particular search-key value
![Page 51: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/51.jpg)
51
Brighton A217 750Downtown A101 500Downtown A110 600Linberg A215 700Perryton A123 400Perryton A130 900Perryton A130 700Redwood A213 700Roundhill A334 300
350
400
900
750
700
600
500
Secondary index on balance field of accounts
![Page 52: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/52.jpg)
52
Primary vs. Secondary indices
•Secondary indices have to be dense
•Indices offer substantial benefits when searching for records
•When a file is modified, every index on the file must be updated.
•Updating indices imposes overhead on database modification.
•Sequential scan using primary index is efficient.
•Sequential scan using secondary index is expensive: each record access may fetch a new block from disk
![Page 53: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/53.jpg)
53
5. B+- Tree Index FilesB+- tree indices are an alternative to indexed- sequential files.
Disadvantage of indexed-sequential files:
performance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required.
Advantage of B+-tree index file:
automatically reorganizes itself with small, local changes, in the face of insertions and deletions. Reorganization of entire file
is not required to maintain performance.
Disadvantage of B+-tree index file:
extra insertion and deletion overhead, space overhead
![Page 54: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/54.jpg)
54
A B+-tree is a rooted tree satisfying the following properties:
1. All paths from root to leaves are of the same length
2. Each node that is not a root or a leaf has between [n/2] and n children
3. A leaf node has between [(n-1)/2] and n-1 values
4. Special cases:
4a. If the root iss not a leaf, it has at least 2 children
4b. If the root is a leaf (it is the single node in the tree), it can have between 0 and (n-1) values
P1 K1 P2 ... Kn-1K2 Pn-1 Pn
Pi : pointers to children (for non leaf nodes) or
pointers to records or buckets of records (for leaf nodes)
Ki : the search-key values, ordered ascendently
![Page 55: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/55.jpg)
55
Leaf Nodes in B+- trees
For j = 1, 2, …, n - 1, pointer Pj either points to a file record with
search-key value Kj, or to a bucket of pointers to file records,
each record having search-key value Kj. Only need bucket structure if search-key does not form a primary key.
If Li and Lj are leaf nodes and i < j, then Li ‘s search-key values are
less then Lj ‘s search-key values
Pn points to next leaf node in search-key order
![Page 56: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/56.jpg)
56
Brighton A217 750Downtown A101 500Downtown A110 600Linberg A215 700Perryton A123 400Perryton A130 900Perryton A130 700Redwood A213 700Roundhill A334 300
Brighton Downtown Next leaf
![Page 57: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/57.jpg)
57
Non-Leaf Nodes in B+ - TreesNon leaf nodes form a multi-level sparse index on the leaf nodes.
For a non-leaf node with m pointers:
1. All the search-keys in the subtree to which P1 points are less than K1
2. For 2 j n -1, all the search-keys in the subtree to which Pj points have values greater than or equal to Kj-1 and less than Kj
3. All the search-keys in the subtree to which Pm points are greater or equal to Km-1
P1 K1 P2 ... Km-1K2 Pm-1 Pm
![Page 58: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/58.jpg)
58
Example of a B+ - Trees
Brighton
Downtown
Linberg
Perriridge
Redwood
Roundhill
Linberg Redwood
Perryridge
N=3
![Page 59: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/59.jpg)
59
Brighton Downtown Linberg RoundhillRedwoodPerryridge
Perryridge
Example of a B+ - Trees N=5
Leaf nodes must have between 2 and 4 values ( [(n-1)/2] and n-1 )
Non-leaf nodes other than root must have between 3 and 5 childrren
Root must have at least 2 children
![Page 60: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/60.jpg)
60
Observations about B+ - Trees
Since the internode connections are done by pointers, there is no assumption that in the tree the “logically” close blocks are “physically” close.
The non-leaf levels of the B+-tree form a hierarchy of sparse indices
The B+-tree contains a relatively small number of levels thus searches can be conducted efficiently.
Insertions and deletions to the main file can be handled efficiently, as the index can be restructured in logarithmic time
![Page 61: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/61.jpg)
61
6. Static Hashing
A bucket is a unit of storage containing one or more records (typically a disk block).
In a hash file organization we obtain the the bucket of a record directly from its search-key value using a hash function.
Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.
Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.
Worst hash function maps all search-key values to the same bucket; so, access time is proportional to the number of search-key values in the file.
![Page 62: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/62.jpg)
62
An ideal hash function is uniform, i.e. each bucket is assigned the same number of search-key values from the set of all possible values.
Ideal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the actual distribution of searching-key values in the file.
Typical hash functions perform computation on the internal binary representation of the search-key.
Examples: mod p, p prime; folding; adding;
Bucket overflow can occur because of insufficient buckets or skew in distribution of records (multiple records have same search-key value or has function is non-uniform). It is handled by using ovderflow buckets, usually chained in a linked list
![Page 63: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/63.jpg)
63
Hash indices
Hashing can be used not only for file organization but also for index-structure creation.
A hash index organizes the search keys with their associated record pointers into a hash file structure.
Hash indices are always secondary indices:
if the file itself is organized using hashing, a separate primary index on it using the same search-key is unnecessary.
However, we use the term hash index to refer to both
secondary index structures and hash organized files
![Page 64: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/64.jpg)
64
Several kinds of uniform hashing function are in use.
1. Direct hashing: the key is the address without any algorithmic manipulation.
The data structure must therefore contain an element for every
possible key.
While the situations where direct hasing are limited, when it can
be used it is very powerful becasue it guarantees that
there are no collisions.
Limitations: Large key value.
Hashing Functions
![Page 65: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/65.jpg)
65
2. Mid-Square (middle of Square) 9452 * 9452 = 89340304 = 3403
As a variation on the midsqaure method, we can select a
portion of the key, such as the middle three digits, and then
use them rather than the whole key. This allows the method
to be used when the key is too large to square.
379452: 379 * 379 = 143641 = 364
121267: 121 * 121 = 014641 = 464
![Page 66: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/66.jpg)
66
3. Modulo-Division
Also known as Division-remainder.
Address = Key MOD Table_size
While this algorithm works with any table size,
a list size that is a prime number produces fewer collisions
than other list sizes.
Tabele size =11
keys: 4 7 12 33 64 75 89
addresses: 4 7 1 0 9 9 1
colissions
![Page 67: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/67.jpg)
67
4. Folding
There are two folding methods that are used,
fold shift and fold boundary.
In fold shift, the key value is divided into parts whose size
matches the size of the required address. Then the left and
right parts are shifted and added with the middle part.
In fold boundary, the left and right numbers are folded on a fixed
boundary between them and the center number.
a. Fold Shift Key: 123456789
123
456
789
---
1368 ( 1 is discarded)
b. Fold Boundary Key: 123456789
321 (digit reversed)
456
987 (digit reversed)
---
1764 ( 1 is discarded)
![Page 68: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/68.jpg)
68
5. Digit-Extraction
Using digit extraction, selected digits are extracted from the
key and used as the address.
For example, using a six-digit employee number to hash to
a three-digit address(000-999), we could select the first,
third. and fourth digits (from left) and use them as the
address.
379452 = 394
121267 = 112
6. Non-Numeric Keys
![Page 69: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/69.jpg)
69
•With the exception of the direct method, none of the methods used for
hashing are one-to-one mapping. This means that when we hash a new
key to an address, we may create a collision. •There are several methods for handling collisions, each of them
independent of the hashing algorithm. •Before we discuss the collision resolution methods, we need to cover
few basic concepts:
Load Factor The load Factor alpha of a hash table of size M with N
occupied entries is defined by
alpha = N/M
Collision Resolution
![Page 70: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/70.jpg)
70
Some hashing algorithms tend to cause data to group within the list. This tendency of data to build up enevenly across a hashed table is known as clustering.
Clustering
2. Secondary Clustering: Secondary clustering occurs when data become grouped
along a collision path throughout the list.
1. Primary Clustering: Primary clustering occurs when data become
clustered around a home address.
![Page 71: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/71.jpg)
71
The first collision resolution method, open addressing, resolves
collisions in the home area. When a collision occurs, the home area
addresses are searched for an open or unoccupied element where the
new data can be placed.
Examples of Open Addressing Methods:
1. Linear Probe i = H(key) is the home address.
If it is available we store the record, otherwise, we increase i by k,
i = (i + k) mod M (k = 1, 2, 3, ...).
Linear probing gives rise to a phenomenon called primary clustering.
Open Addressing
![Page 72: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/72.jpg)
72
2. Quadratic Probe If there is a collision at hash address h,
this method probes the table at locations h+1, h+4, h+9, ..., that is, at locations h + i^2 (mod tablesize) for i = 1, 2, ....
That is, the increment function is i^2. • Quadratic probing substantially reduces clustering, but it is
not obvious that it will probe all locations in the table, and
in fact it does not. • For some values of hash_size the function will probe
relatively few position in the table.
![Page 73: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/73.jpg)
73
3. Double Hashing • Double Hashing uses nonlinear probing by computing
different probe increments for different keys. • It uses two functions. The first function computes the original
address, if the slot is available (or the record is found) we stop
there, otherwise, we apply the second hashing function to
compute the step value.
i = H1 (key) to compute the home address
H2(key) = step value = Max (1, Key DIV M) MOD M
i = i + step value we repeat this until we find a place or we find the record.
Double hashing avoids primary and secondary clustering
![Page 74: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/74.jpg)
74
One way of resolving collisions is to maintain M linked lists,
one for each possible address in the hash table.
A key K hashes to an address i = h(k) in the table.
At address i, we find the head of a list containing all records
having keys that have hashed to i.
This list is then searched for a record containing key K.
Chaining
![Page 75: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/75.jpg)
75
•Suppose we divide a table into M groups of records, with each group containing exactly b records. •Each group of b records is called a bucket. •The hash function h(k) computes a bucket number from the key K,and the record containing K is stored in the bucket whose bucketnumber is h(K). If a particular bucket overflows, an overflow policy is involved. If a bucket overflows, a chaining technique can be used to link to an "overflow" bucket. This link can be planted at the end of the overflowed bucket. It is convenient to keep overflow buckets on the same cylinder, or we may have a separate cylinder for overflows.
Buckets
![Page 76: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/76.jpg)
76
Suppose hash table T of size M has exactly N occupied entries,
so that its load factor, alpha, is N/M. Let's now define two quantities, Cn and C'n, where
Cn is the average number of probe addresses examined
during a successful search, and where
C'n is the average number of probe examined during
an unsuccessful search.
Performance Formulas
Efficiency of Linear Probing Successful Search = Cn = (1 + 1/(1-a))/2
Unsuccessful search = C'n = (1+(1/(1-a))^2)/2
![Page 77: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/77.jpg)
77
Double Hashing
Successful Search = Cn = ln (1/(1-a))/a
Unsuccessful search = C'n = 1/(1-a)
Separate Chaining
Successful Search = Cn = 1 + a/2
Unsuccessful search = C'n = a
![Page 78: Storage and File Structure](https://reader034.vdocuments.net/reader034/viewer/2022042609/56812b25550346895d8f2645/html5/thumbnails/78.jpg)
78
Index Definition in SQL
Create an index:
create index <index_name> on <relation_name> (<attribute_list>)
Example: create index b-index on branch (branch-name)
use create unique index to indirectly specify and enforce the condition that the search key is a candidate key
To drop an index:
drop index <index_name>