1 storage and file structure 1. classification of physical storage media 2. storage access 3. file...

78
1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

Upload: byron-daniel

Post on 31-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

1

Storage and File Structure

1. Classification of physical storage media

2. Storage access

3. File organization

4. Indexing

5. B+ - trees

6. Static hashing

Page 2: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

2

1. Classification of Physical Storage Media

Criteria:

speed with which data can be accessed

cost per unit of data

reliability

data loss on power failure or system crash

physical failure of the storage device

Volatile storage: loses contents when power is switched off

Non-volatile storage: contents persist even when power is switched off.

Includes secondary and tertiary storage, as well as battery-backed up main memory

Page 3: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

3

Cache: the fastest and most costly form of storage

volatile

managed by the hardware /operating system

Main memory: sometimes referred to as core memory

volatile -contents of main memory are usually lost if a power failure or system crash occurs

general-purpose machine instructions operates on data resident in main memory

fast access, but generally too small to store the entire database

Flash memory:non-volatile memory (data survives power failure)

reads are roughly as fast as main memory

can support a only limited number of write/erase cycles

Page 4: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

4

Magnetic-disk storage

primary medium for the long-term storage of data

typically stores entire database

data must be moved from disk to main memory for access and written back for storage

direct-access- possible to read data on disk in any order

usually survives power failures and system crashes

disk failure can destroy data, but is much less frequent than system crashes

Optical storage: non-volatile

the most popular: CD-ROM

write-once, read-many (WORM) optical disks:used for archival storage

Page 5: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

5

Tape storage:

non-volatile

used primarily for

backup (to recover from disk failure)

archive data

sequential-access - much slower than direct access,disk

very high capacity (5 GB is common)

tapes can be removed from drive:--> storage costs much cheaper than disk

Page 6: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

6

Storage hierarchy

cache

Main memory

Flash memory

Magnetic disk

Optical disk

Magnetic tape

Page 7: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

7

Primary storage: fastest media but volatile

CACHE, MAIN MEMORY

Secondary storage: moderately fast access time, non-volatile

also called on-line storage

FLASH MEMORY, MAGNETIC DISKS

Tertiary storage: slow access time, non-volatile

also called off-line storage

MAGNETIC TAPES, OPTICAL STORAGE

Page 8: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

8

Magnetic disks:

Read-write head: device positioned closed to the platter surface

reads or writes magnetically encoded information

Surface of platter is divided into circular tracks

Each track is divided into sectors

A sector is the smallest unit of data that can be read or written

Cylinder j consists of the j-th track of all the platters

Head-disk assemblies- multiple disk platters on a single spindle, with multiple heads (one per platter) mounted on a common armTo read/write a sector:

disk arm swings to position head on the right track

platter spins continually; data is read/written when sector comes under head

Page 9: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

9

spindleTrack t

Sector s

Cylinder c

platter

Arm assemblyarm

read-write head

Page 10: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

10

Disk subsystem

System Bus

Disk controller

disks

Interfaces between the computer system and the disk drive hardware

Accepts high-level commands to read or write a sector

Initiates actions such as moving the disk arm to the right track and actually reading or writing the data

Page 11: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

11

Performance measures of disks

Access time: the time it takes from when a read or write request is issued to when data transfer begins. Consists of:

seek time: time it takes to reposition the arm over the correct track. Average seek time is 1/3rd the

worst case seek time.

rotational latency: time it takes for the sector to be accessed to appear under the head. Average latency is 1/2 of the worst case latency.

Data-transfer rate: the rate at which data can be retrieved from or stored to the disk

Mean time to failure (MTTF): the average time the disk is expected to run continuously without any failure

Page 12: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

12

Optimisation of Disk-Block Access

Block a contiguous sequence of sectors from a single track

data is transferred between disk and main memory in blockssizes range from 512 bytes to several KB

Disk-arm-scheduling algorithms order accesses to tracks so that disk arm movement is minimized (e. g. elevator algorithm )

File organization: optimise block access time by organizing the blocks to correspond to how data will be accessed. Stores related information on the same or nearby cylinders

Non-volatile write buffers: speed up disk writes by writing blocks to a non-volatile RAM buffer immediately. Controller than writes to disk whenever the disk has no other requests

Log disk: a disk devoted to writing a sequential log of block updates; this eliminates seek time. Used like nonvolatile RAM

Page 13: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

13

RAID = Redundant Arrays of Inexpensive Disks

it is a disk organization techniques that take advantage of utilizing large numbers of inexpensive, mas-market disks

originally a cost-effective alternative to large, expensive disks

today RAIDs are used for their higher reliability and bandwidth, rather than for economic reasons, hence I is interpreted as independent instead of inexpensive

The chance that some disk out of a set of N disks will fail is much higher than the chance that a specific single disk will fail.

For instance, a system with 100 disks, each with MTTF of 100,000 hours (approx. 11 years) will have a system MTTF of 1000 hours (approx. 41 days)

Page 14: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

14

Improvement of Reliability via Redundancy

Redundancy store extra information than can be used to rebuild information lost in a disk failure.

EX: Mirroring (Shadowing): duplicate every disk.

Logical disk consists of two physical disks

every write is carried out on both disks

if one disk in a pair fails, data is still available on the other

Page 15: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

15

Improvement in Performance via Parallelism

Two main goals of parallelism in a disk system:

1. Load balance multiple small accesses to increase throughput

2. Parallelize large accesses to reduce response time

Improve transfer rate by striping data across multiple disks:

1. Bit-level striping: split the bits of each byte across multiple disks

in an array of 8 disks, write bit j of each byte on disk j

each access can read data at 8 times the rate of a single disk

but seek/access time worse than for a single disk

2. Block-level striping: with n disks, block j of a file goes to disk (j mod n) + 1

Page 16: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

16

RAID levels

Schemes to provide redundancy at lower cost by using disk striping combined with parity bits

Different RAID organizations (RAID levels) have differing cost, performance and reliability characteristics

Level 0: striping at the level of blocks

non-redundant

used in high-performance applications where data loss is not critical

Level 1: mirrored disks

offers best write performance

popular for applications such as storing log files in a database system

Page 17: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

17

Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit striping

Level 3: Bit-Interleaved Parity: a single parity bit can be used for error correction, not just detection

When writing data, parity bit must also be computed and written

faster data transfer than with a single disk, but fewer I/Os per second since every disk has to participate in every I/O

subsumes Level 2 (provides all its benefits, at lower cost)

Page 18: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

18

Level 4: Block-Interleaved Parity

uses block-level striping

keeps a parity block on a separate disk for corresponding blocks from N other disks

provides higher I/O rates for independent block reads than Level 3 (bloc read goes to a single edisk, so blocks stored on different disks can be read in parallel

provides high transfer rates for reads of multiple blocks

however, parity block becomes a bottleneck for independent block writes since every block write also writes to parity disk

Page 19: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

19

Level 5: Block-Interleaved Distributed Parity

partions data and parity among all N + 1 disks, rather than storing data in N disks and parity in 1 disk

e.g. with 5 disks, parity block for nth set of blocks is stored on disk (n mod 5) + 1, with the data blocks stored on the other 4 disks

higher I/O rates than level 4. (block writes occur in parallel if the blocks and their parity blocks are on different disks).

Subsumes Level 4

Level 6: P + Q redundancy Scheme

similar to Level 5 but store extra redundant information to guard against multiple disk failures

better reliability than Level 5 at a higher cost

not used as widely

Page 20: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

20

Optical disks

Compact disk- read only memory (CD-ROM)

disks can be loaded into or removed from a drive

high storage capacity (500 MB)

high seek times and latency

lower data-transfer rates than magnetic disks

Digital Video Disk (DVD)- new optical format

hold 4.7 to 17 GB

WORM disk (write-once read-many)

can be written using the same drive from which they are read

high capacity and long lifetime

used for archival storage

WORM jukebox

Page 21: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

21

Magnetic Tapes

hold large volumes of data (5 GB usual)

currently the cheapest storage medium

very slow acces time in comparison to magnetical and optical disks

limited to sequential access

used mainly for backup, for storage of infrequently used information, as an off-line medium from transfering information from one system to another

tape jukeboxes used for very large capacity (terabype, 1012 to petabyte, 1015 storage)

Page 22: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

22

2. Storage Access

Block: A database file is partitioned into fixed-length storage units

Unit of both storage allocation and data transfer

DBMS seeks to minimize the number of block transfers between the disk and the memory: by keeping as many blocks as possible in the main memory

Buffer portion of the main memory available to store copies of disk blocks

Buffer Manager subsystem responsible for allocating buffer space in main memory

Page 23: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

23

Buffer Manager

Programs call on the BM when they need a block from disk

if it is already present in the buffer, the requesting program is given the address of the block in main memory

if the block is not in the buffer, the BM allocates space in the buffer for the block, replacing (throwing out) some other block, if required, to make space for the new block

the block that is thrown out is written back to disk only if it was modified since the most recent time it was

written to/fetched from the disk

once space is allocated in the buffer, the BM reads in the block from the disk to the buffer, and passes the address of the block in main memory to the requester

Page 24: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

24

Buffer -Replacement Policies

Most operating systems replace the block least recently used (LRU)

LRU - use past pattern of block references as a predictor of future references

Queries have well-defined access patterns (such as sequential scans), and a DBMS can use the information in a user’s query to predict future references

LRU can be a bad strategy for certain access patterns involving repeated scans of data

Mixed strategy with hints on replacement strategy provided by the query optimizer is preferable

Page 25: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

25

Pinned block memory block that is not allowed to be written back to disk

Toss-immediate strategy frees the space occupied by a block as soon as the final tuple of that block

has been processed

Most Recently Used (MRU) strategy: system must pin the block currently being processed. After the final tuple of that block has been processed, the block is un pinned, and it becomes the mru block.

Buffer manager (BM) can use statistical information regarding the probability that a request will reference a particular relation.

EX: the data dictionary is frequently accessed

Heuristic: keep data-dictionary blocks in main memory buffer

Page 26: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

26

3. File Organization

The database is stored as a collection of files.

Each file is a sequence of records.

A record is a sequence of fields.

Page 27: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

27

To delete the j record- alternatives:

1. Move records j+1, .., n to j, …, n-1

2. Move record n to j

3. Link all free records on a free list

The simplest approach:

record size is fixed (n) -->

store record j starting from byte n*(j -1)

record access is simple but records may cross blocks

each file has records of one particular type only

different files are used for different relations

Page 28: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

28

Free list

Store the address of the first record whose contents are deleted in the file header

Use this first record to store the address of the second available record, and so on

Can think of these stored addresses as pointers since they “point” to the location of a record

header

Record 2

Record 3

Record 4

Record 5

Record 6

Record 7

Record 0

Record 1

Record 8

John A100 300

Marry A200 600

Arthur A300 700

John A1001 3000

Marry A106 500

Richard A1004 500

nill

Page 29: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

29

More space efficient representation: reuse space for normal attributes of free records to store pointers

(no pointers stored in in-use records)

Dangling pointers occur if we move or delete a record to which another record contains a pointer

That pointer no longer points to the desired record

Avoid moving or deleting records that are pointed to by other records.

Such records are pinned.

Page 30: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

30

Variable-Length Records

Arise in database systems in several ways:

storage of multiple record types in a file

record types that allow variable lengths for one or more fields

record types that allow repeating fields (used in some older data models)

Byte string representation

attach an end-of-record () control character to the end of each record

difficulty with deletion

difficulty with growth

Page 31: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

31

Header contains:

number of record entries

end of free space in the block

location and size of each record

Page structure:

#entries

End fr spFree space

Page 32: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

32

Records can be moved around within a page to keep them contiguous with no empty space between them;

entry in the header must then be updated

Pointers should not point directly to record - instead they should point to the entry for the record in header

Fixed-length representation:

reserved space: can use fixed-length records of a known maximum length

unused space in shorter records filled with a null or end-of-record symbol

pointers: the maximum record length is known

a variable-length record is represented by a list of fixed-length records, chained together via pointers

Page 33: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

33

Disadvantage to pointer structure:

space is wasted in all records, except the first in a chain

Solution is to allow two kinds of block in a file:

1. Anchor block: contains the first record of a chain

2. Overflow block: contains records other than those that are the first records of chains

Page 34: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

34

Organization of records in Files

Heap a record can be placed anywhere in the file where there is space

Sequential store records in sequential order, based on the value of the search key of each record

Hashing a hash function is computed on some attribute of each record

the result specifies in which block of the file the record should be placed

Clustering records of several different relations can be stored in the same file

related records are stored on the same block

Page 35: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

35

Sequential File Organization

Suitable for applications that require sequential processing of the entire file

The records in the file are ordered by a search-key

INSERTION: must locate the position in the file where the record is to be inserted:

if ther is free space, insert there

if no free space, insert the record in an overflow block

in either case, pointer chain must be updated

DELETION: use pointer chains

Need to reorganize the file from time to time to restore sequential order

Page 36: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

36

Brighton A217 750Downtown A101 540Downtown A110 400Linberg A215 200Perryton A123 100Perryton A130 500Redwood A213 700

Roundhill A334 112

Page 37: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

37

Clustering File Organization

Simple file structure: stores each relation in a separate file

Alternative: store several relations in one file using a clustering file organization

Good for queries involving all relations

Bad for queries involving a single relation

Results in variable size records

Page 38: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

38

Data Dictionary Storage (System Catalog)

It stores metadata (data about data):

information about relations

names of relations, names and types of attributes, physical file organization structure, statistical data (e.g. number of tuples in each relation)

integrity constraints

view definitions

user and accounting information

information about indices

Page 39: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

39

Catalog structure: can use either

specialized data structures designed for efficient acces, OR

a set of relations, with existing system features used to ensure efficient access (preferred)

EXAMPLE.

System-catalog-schema = (relation-name, number-of-attributes)

Attribute-schema = (attribute-name, relation-name, domain-type, position, length)

User-schema = (user-name, encrypted-password, group)

Index-schema = (index-name, relation-name, index-type, index-attributes)

View-schema = (view-name, definition)

Page 40: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

40

4.Indexing

Search-key

Indexing mechanisms used to speed up access to desired data

ex: author catalog in a library

search key (set of) attribute(s) used to look up records in a file

index file consists of records (index entries) of the form

pointer

Index files are typically much smaller than the original file

Two basic kinds of indices:

ordered indices: search keys are stored in sorted order

hash indices: search keys are distributed uniformly across “buckets” using a “hash function”

Page 41: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

41

Index Evaluation metrics

Indexing techniques are evaluated on basis of:

• access types supported efficiently, e.g.

records with a specified value in an attribute

or records with an attribute value falling in a

specified range of values

• access time

• insertion time

• deletion time

• space overhead

Page 42: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

42

Ordered IndicesIndex entries are stored sorted on the sort key value

Primary index: in a sequentially ordered file:

the index whose search key specifies the sequential order of the file

also called clustering index

the search key of a primary index is usually but not necessarily the primary key

Secondary index: an index whose search key specifies an order differrent from the sequential order of the file.

also called non-clustering index

Index-sequential file: an ordered sequential file with a primary index

Page 43: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

43

Dense Index Files:

index record appears for every search - key value in the file

Sparse Index Files:

• index records appear for some search-key values -->

• less space & maintenance overhead for insertions & deletions

• generally slower than dense index for locating records

• to locate a record with search-key value K we

1. Find index record with largest search-key value < K

2. Search file sequentially starting at the record to which the index record points

• good tradeoff: sparse index with an index entry for every block in file (the least search-key value in the block)

Page 44: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

44

Brighton A217 750Downtown A101 540Downtown A110 400Linberg A215 200Perryton A123 100Perryton A130 500Redwood A213 700

Roundhill A334 112

BrightonDowntownLinbergPerrytonRedwood

Roundhill

Example of dense index: index record appears for every search-key value in the file

Page 45: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

45

Brighton A217 750Downtown A101 540Downtown A110 400Linberg A215 200Perryton A123 100Perryton A130 500Redwood A213 700

Roundhill A334 112

Brighton

Linberg

Redwood

Example of sparse index: index records appear for some search-key values in the file

Page 46: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

46

Multilevel Index

•If primary index does not fit in memory: access becomes expensive

•to reduce number of disk accesses to index records:

treat primary index (inner index) kept on disk as a sequential file

construct a sparse index (outer index) on it

•if even outer index is too large to fit in main memory:

another level of index can be created , and so on

•indeces at all levels must be updated on insertion or deletion from the file

Page 47: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

47

...

Data block 0

Data block 1

Data block n-1

...

Inner index

...

Index block 0

Index block 1

Outer index

Page 48: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

48

Index Update: Deletion

If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index too.

Singel-level index deletion:

1. Dense indeces:

deletion of search-key is similar to file record deletion

2. Sparse indeces:

if an entry for the search key exists in the index:

it is deleted by replacing the entry in the index with the next search-key value in the file(in search-key order)

if the next search-key value already has an index entry, the entry is deleted instead of being replaced

Page 49: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

49

Index Update: Insertion

Single-level index insertion:

1. Perform a lookup using the search-key value appearing in the record to be inserted

2a. Dense indeces: if the search-key value does not appear in the index, insert it

2b. Sparse indeces: if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created. In this case, the first search-key value appearing in the new block is inserted into the index.

Multi-level index insertion:

(as well as deletion) algorithms are simple extensions of the single-level algorithms

Page 50: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

50

Secondary IndicesProblem: Find all the records whose values in a certain field

satisfy some condition

1. if field = search-key of the primary index: no problem 2. if field <> search-key of the primary index: -->secondary index

Examples:

in the account database stored sequentially by account number, we may want to find all accounts in a particular branch

as above, but we want to find all accounts with a specified balance or range of balances

Secondary index: an index record for each search-key value

index record points to a bucket that contains pointers to all the actual records with that particular search-key value

Page 51: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

51

Brighton A217 750Downtown A101 500Downtown A110 600Linberg A215 700Perryton A123 400Perryton A130 900Perryton A130 700Redwood A213 700

Roundhill A334 300

350

400

900

750

700

600

500

Secondary index on balance field of accounts

Page 52: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

52

Primary vs. Secondary indices

•Secondary indices have to be dense

•Indices offer substantial benefits when searching for records

•When a file is modified, every index on the file must be updated.

•Updating indices imposes overhead on database modification.

•Sequential scan using primary index is efficient.

•Sequential scan using secondary index is expensive: each record access may fetch a new block from disk

Page 53: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

53

5. B+- Tree Index Files

B+- tree indices are an alternative to indexed- sequential files.

Disadvantage of indexed-sequential files:

performance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required.

Advantage of B+-tree index file:

automatically reorganizes itself with small, local changes, in the face of insertions and deletions. Reorganization of entire file

is not required to maintain performance.

Disadvantage of B+-tree index file:

extra insertion and deletion overhead, space overhead

Page 54: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

54

A B+-tree is a rooted tree satisfying the following properties:

1. All paths from root to leaves are of the same length

2. Each node that is not a root or a leaf has between [n/2] and n children

3. A leaf node has between [(n-1)/2] and n-1 values

4. Special cases:

4a. If the root iss not a leaf, it has at least 2 children

4b. If the root is a leaf (it is the single node in the tree), it can have between 0 and (n-1) values

P1 K1 P2 ... Kn-1K2 Pn-1 Pn

Pi : pointers to children (for non leaf nodes) or

pointers to records or buckets of records (for leaf nodes)

Ki : the search-key values, ordered ascendently

Page 55: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

55

Leaf Nodes in B+- trees

For j = 1, 2, …, n - 1, pointer Pj either points to a file record with

search-key value Kj, or to a bucket of pointers to file records,

each record having search-key value Kj. Only need bucket

structure if search-key does not form a primary key.

If Li and Lj are leaf nodes and i < j, then Li ‘s search-key values are

less then Lj ‘s search-key values

Pn points to next leaf node in search-key order

Page 56: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

56

Brighton A217 750Downtown A101 500Downtown A110 600Linberg A215 700Perryton A123 400Perryton A130 900Perryton A130 700Redwood A213 700

Roundhill A334 300

Brighton Downtown Next leaf

Page 57: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

57

Non-Leaf Nodes in B+ - Trees

Non leaf nodes form a multi-level sparse index on the leaf nodes.

For a non-leaf node with m pointers:

1. All the search-keys in the subtree to which P1 points are less than K1

2. For 2 j n -1, all the search-keys in the subtree to which Pj points have values greater than or equal to Kj-1 and less than Kj

3. All the search-keys in the subtree to which Pm points are greater or equal to Km-1

P1 K1 P2 ... Km-1K2 Pm-1 Pm

Page 58: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

58

Example of a B+ - Trees

Brighton

Downtown

Linberg

Perriridge

Redwood

Roundhill

Linberg Redwood

Perryridge

N=3

Page 59: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

59

Brighton Downtown LinbergRoundhillRedwoodPerryridge

Perryridge

Example of a B+ - Trees N=5

Leaf nodes must have between 2 and 4 values ( [(n-1)/2] and n-1 )

Non-leaf nodes other than root must have between 3 and 5 childrren

Root must have at least 2 children

Page 60: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

60

Observations about B+ - Trees

Since the internode connections are done by pointers, there is no assumption that in the tree the “logically” close blocks are “physically” close.

The non-leaf levels of the B+-tree form a hierarchy of sparse indices

The B+-tree contains a relatively small number of levels thus searches can be conducted efficiently.

Insertions and deletions to the main file can be handled efficiently, as the index can be restructured in logarithmic time

Page 61: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

61

6. Static Hashing

A bucket is a unit of storage containing one or more records (typically a disk block).

In a hash file organization we obtain the the bucket of a record directly from its search-key value using a hash function.

Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.

Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.

Worst hash function maps all search-key values to the same bucket; so, access time is proportional to the number of search-key values in the file.

Page 62: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

62

An ideal hash function is uniform, i.e. each bucket is assigned the same number of search-key values from the set of all possible values.

Ideal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the actual distribution of searching-key values in the file.

Typical hash functions perform computation on the internal binary representation of the search-key.

Examples: mod p, p prime; folding; adding;

Bucket overflow can occur because of insufficient buckets or skew in distribution of records (multiple records have same search-key value or has function is non-uniform). It is handled by using ovderflow buckets, usually chained in a linked list

Page 63: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

63

Hash indices

Hashing can be used not only for file organization but also for index-structure creation.

A hash index organizes the search keys with their associated record pointers into a hash file structure.

Hash indices are always secondary indices:

if the file itself is organized using hashing, a separate primary index on it using the same search-key is unnecessary.

However, we use the term hash index to refer to both

secondary index structures and hash organized files

Page 64: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

64

Several kinds of uniform hashing function are in use.

1. Direct hashing:

the key is the address without any algorithmic manipulation.

The data structure must therefore contain an element for every

possible key.

While the situations where direct hasing are limited, when it can

be used it is very powerful becasue it guarantees that

there are no collisions.

Limitations: Large key value.

Hashing Functions

Page 65: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

65

2. Mid-Square (middle of Square)

9452 * 9452 = 89340304 = 3403

As a variation on the midsqaure method, we can select a

portion of the key, such as the middle three digits, and then

use them rather than the whole key. This allows the method

to be used when the key is too large to square.

379452: 379 * 379 = 143641 = 364

121267: 121 * 121 = 014641 = 464

Page 66: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

66

3. Modulo-Division

Also known as Division-remainder.

Address = Key MOD Table_size

While this algorithm works with any table size,

a list size that is a prime number produces fewer collisions

than other list sizes.

Tabele size =11

keys: 4 7 12 33 64 75 89

addresses: 4 7 1 0 9 9 1

colissions

Page 67: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

67

4. Folding

There are two folding methods that are used,

fold shift and fold boundary.

In fold shift, the key value is divided into parts whose size

matches the size of the required address. Then the left and

right parts are shifted and added with the middle part.

In fold boundary, the left and right numbers are folded on a fixed

boundary between them and the center number.

a. Fold Shift Key: 123456789

123

456

789

---

1368 ( 1 is discarded)

b. Fold Boundary Key: 123456789

321 (digit reversed)

456

987 (digit reversed)

---

1764 ( 1 is discarded)

Page 68: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

68

5. Digit-Extraction

Using digit extraction, selected digits are extracted from the

key and used as the address.

For example, using a six-digit employee number to hash to

a three-digit address(000-999), we could select the first,

third. and fourth digits (from left) and use them as the

address.

379452 = 394

121267 = 112

6. Non-Numeric Keys

Page 69: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

69

•With the exception of the direct method, none of the methods used for

hashing are one-to-one mapping. This means that when we hash a new

key to an address, we may create a collision.

•There are several methods for handling collisions, each of them

independent of the hashing algorithm.

•Before we discuss the collision resolution methods, we need to cover

few basic concepts:

Load Factor

The load Factor alpha of a hash table of size M with N

occupied entries is defined by

alpha = N/M

Collision Resolution

Page 70: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

70

Some hashing algorithms tend to cause data to group within the list. This tendency of data to build up enevenly across a hashed table is known as clustering.

Clustering

2. Secondary Clustering:

Secondary clustering occurs when data become grouped

along a collision path throughout the list.

1. Primary Clustering:

Primary clustering occurs when data become

clustered around a home address.

Page 71: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

71

The first collision resolution method, open addressing, resolves

collisions in the home area. When a collision occurs, the home area

addresses are searched for an open or unoccupied element where the

new data can be placed.

Examples of Open Addressing Methods:

1. Linear Probe i = H(key) is the home address.

If it is available we store the record, otherwise, we increase i by k,

i = (i + k) mod M (k = 1, 2, 3, ...).

Linear probing gives rise to a phenomenon called primary clustering.

Open Addressing

Page 72: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

72

2. Quadratic Probe

If there is a collision at hash address h,

this method probes the table at locations h+1, h+4, h+9, ...,

that is, at locations h + i^2 (mod tablesize) for i = 1, 2, ....

That is, the increment function is i^2.

• Quadratic probing substantially reduces clustering, but it is

not obvious that it will probe all locations in the table, and

in fact it does not.

• For some values of hash_size the function will probe

relatively few position in the table.

Page 73: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

73

3. Double Hashing

• Double Hashing uses nonlinear probing by computing

different probe increments for different keys.

• It uses two functions. The first function computes the original

address, if the slot is available (or the record is found) we stop

there, otherwise, we apply the second hashing function to

compute the step value.

i = H1 (key) to compute the home address

H2(key) = step value = Max (1, Key DIV M) MOD M

i = i + step value

we repeat this until we find a place or we find the record.

Double hashing avoids primary and secondary clustering

Page 74: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

74

One way of resolving collisions is to maintain M linked lists,

one for each possible address in the hash table.

A key K hashes to an address i = h(k) in the table.

At address i, we find the head of a list containing all records

having keys that have hashed to i.

This list is then searched for a record containing key K.

Chaining

Page 75: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

75

•Suppose we divide a table into M groups of records, with each

group containing exactly b records.

•Each group of b records is called a bucket.

•The hash function h(k) computes a bucket number from the key K,

and the record containing K is stored in the bucket whose bucket

number is h(K).

If a particular bucket overflows, an overflow policy is involved.

If a bucket overflows, a chaining technique can be used to link

to an "overflow" bucket.

This link can be planted at the end of the overflowed bucket.

It is convenient to keep overflow buckets on the same cylinder,

or we may have a separate cylinder for overflows.

Buckets

Page 76: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

76

Suppose hash table T of size M has exactly N occupied entries,

so that its load factor, alpha, is N/M.

Let's now define two quantities, Cn and C'n, where

Cn is the average number of probe addresses examined

during a successful search, and where

C'n is the average number of probe examined during

an unsuccessful search.

Performance Formulas

Efficiency of Linear Probing

Successful Search = Cn = (1 + 1/(1-a))/2

Unsuccessful search = C'n = (1+(1/(1-a))^2)/2

Page 77: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

77

Double Hashing

Successful Search = Cn = ln (1/(1-a))/a

Unsuccessful search = C'n = 1/(1-a)

Separate Chaining

Successful Search = Cn = 1 + a/2

Unsuccessful search = C'n = a

Page 78: 1 Storage and File Structure 1. Classification of physical storage media 2. Storage access 3. File organization 4. Indexing 5. B + - trees 6. Static hashing

78

Index Definition in SQL

Create an index:

create index <index_name> on <relation_name> (<attribute_list>)

Example: create index b-index on branch (branch-name)

use create unique index to indirectly specify and enforce the condition that the search key is a candidate key

To drop an index:

drop index <index_name>