data storage and access methods min song is698. database design process conceptual model logical...

50
Data Storage and Access Methods Min Song IS698

Post on 21-Dec-2015

228 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Data Storage and Access Methods

Min SongIS698

Page 2: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

PhysicalDesign

Page 3: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical Database Design Many physical database design decisions are

implicit in the technology adopted Also, organizations may have standards or

an “information architecture” that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations.

We will be concerned with some of the possible physical implementation issues

Page 4: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical Database Design

The primary goal of physical database design is data processing efficiency

We will concentrate on choices often available to optimize performance of database services

Physical Database Design requires information gathered during earlier stages of the design process

Page 5: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical Design Information Information needed for physical file and

database design includes: Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used

entered, retrieved, deleted, updated, and how often

Expectations and requirements for response time, and data security, backup, recovery, retention and integrity

Descriptions of the technologies used to implement the database

Page 6: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical Design Decisions

There are several critical decisions that will affect the integrity and performance of the system Storage Format Physical record composition Data arrangement Indexes Query optimization and performance

tuning

Page 7: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Storage Format

Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database

Data Type (format) is chosen to minimize storage space and maximize data integrity

Page 8: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Objectives of data type selection Minimize storage space Represent all possible values Improve data integrity Support all data manipulations The correct data type should, in minimal

space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

Page 9: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Access Data Types Numeric (1, 2, 4, 8 bytes, fixed or float) Text (255 max) Memo (64000 max) Date/Time (8 bytes) Currency (8 bytes, 15 digits + 4 digits decimal) Autonumber (4 bytes) Yes/No (1 bit) OLE (limited only by disk space) Hyperlinks (up to 64000 chars)

Page 10: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Access Numeric types Byte

Stores numbers from 0 to 255 (no fractions). 1 byte Integer

Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes

Long Integer (Default) Stores numbers from –2,147,483,648 to 2,147,483,647 (no

fractions). 4 bytes Single

Stores numbers from -3.402823E38 to –1.401298E–45 for negative values and from 1.401298E–45 to 3.402823E38 for positive values. 4 bytes

Double Stores numbers from –1.79769313486231E308 to –

4.94065645841247E–324 for negative values and from 1.79769313486231E308 to 4.94065645841247E–324 for positive values. 15 8 bytes

Replication ID Globally unique identifier (GUID) N/A 16 bytes

Page 11: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Designing Physical Records

A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit

Fixed Length and variable fields

Page 12: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Data Storage

Storing Data: Disks Buffer manager Representing relational data in a disk

Page 13: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

The Memory Hierarchy

Main Memory = Disk Cache•Volatile• 256M-1G•Access time: 10-100 nanoseconds

•Persistent •10-100 GB storage• speed:

•Rate=5-10 MB/S•Access time=

10-15 msecs.

• 1.5 MB/S transfer rate• 280 GB typical capacity• Only sequential access• Not for operational data

Processor Cache:• access time 10 nano’s• 512K

Disk Tape

Page 14: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Main Memory Fastest, most expensive (excluding

cache) Today: 512MB are common even on

PCs Many databases could fit in memory

New industry trend: Main Memory Database

E.g TimesTen Main issue is volatility

Page 15: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Secondary Storage

Disks Slower, cheaper than main memory Persistent !!! The unit of disk I/O = block

Typically 1 block = 4k A disk block is also called a disk page or

simply a page Used with a main memory buffer

Page 16: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Block Blocking factor (bfr) for a file is the

average number of records stored in a disk block.

Suppose the block size of a database system is 2000 bytes. Customer table has an average record length of 190 bytes. Assume the overhead of a block for the data is 100 bytes. What is the blocking factor?

Page 17: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

The Mechanics of Disk

Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of sectors (256/track) Number of bytes / sector (29=512) Block size (212=4096)

Platters

Spindle

Disk head

Arm movement

Arm assembly

Tracks

Sector

Cylinder

Page 18: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Important Disk Access Characteristics

Block access time = Disk latency + transfer time Disk latency = seek time + rotational latency Seek time = time for the head to reach the right track

10ms – 40ms Rotational latency = rotation time to get to the right

sector Time for one rotation = 10ms Average rotation latency = 10ms/2

Transfer time = typically 5-10MB/s Disks read/write one block at a time (typically 4kB)

Page 19: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Representing Data Elements

Relational database elements:CREATE TABLE Product (

pid INT PRIMARY KEY,name CHAR(20),description VARCHAR(200),maker CHAR(10) REFERENCES Company(name))

A tuple is represented as a record

Page 20: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Record Formats: Fixed Length

Information about field types same for all records in a file; stored in system catalogs.

Finding i’th field requires scan of record. Note the importance of schema information!

Base address (B)

L1 L2 L3 L4

F1 F2 F3 F4

Address = B+L1+L2

Page 21: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Record Header

L1 L2 L3 L4

F1 F2 F3 F4

To schema

length

timestamp

Need the header because:•The schema may change

for a while new+old may coexist•Records from different relations may coexist

header

Page 22: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Variable Length Records

L1 L2 L3 L4

F1 F2 F3 F4

Other header information

length

Place the fixed fields first: F1, F2Then the variable length fields: F3, F4Null values take 2 bytes onlySometimes they take 0 bytes (when at the end)

header

Page 23: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Records With Referencing Fields

L1 L2 L3

F1 F2 F3

Other header information

length

header

E.g. to represent one-many or many-many relationships

Page 24: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Storing Records in Blocks

Blocks have fixed size (typically 4k)

R1R2R3

BLOCK

R4

Page 25: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Spanning Records Across Blocks

When records are very large Or even medium size: saves space in

blocks

blockheader

blockheader

R1 R2 R2 R3

Page 26: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

BLOB

Binary large objects Supported by modern database

systems E.g. images, sounds, etc. Storage: attempt to cluster blocks

together

Page 27: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Modifications: Insertion File is unsorted

add it to the end File is sorted:

Is there space in the right block ? Yes: we are lucky, store it there

Is there space in a neighboring block ? Look 1-2 blocks to the left/right, shift records

If anything else fails, create overflow block

Page 28: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Overflow Blocks

After a while the file starts being dominated by overflow blocks: time to reorganize

Blockn-1 Blockn Blockn+1

Overflow

Page 29: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Modifications: Deletions

Free space in block, shift records Maybe be able to eliminate an

overflow block

Page 30: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Modifications: Updates

If new record is shorter than previous, easy

If it is longer, need to shift records, create overflow blocks

Page 31: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical Addresses Each block and each record have a physical

address that consists of: The host The disk The cylinder number The track number The block within the track For records: an offset in the block

sometimes this is in the block’s header

Page 32: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Logical Addresses

Logical address: a string of bytes (10-16)

More flexible: can blocks/records around

But need translation table:

Logical addressPhysical address

L1 P1

L2 P2

L3 P3

Page 33: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Main Memory Address

When the block is read in main memory, it receives a main memory address

Buffer manager has another translation table

Memory address

Logical address

M1 L1

M2 L2

M3 L3

Page 34: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Designing Physical/Internal Model

Overview terminology Access methods

Page 35: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical Design

Internal Model/Physical Model

OperatingSystem

Access Methods

DataBase

User request

DBMSInternal ModelAccess Methods

External Model

Interface 1

Interface 3

Interface 2

Page 36: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical Design Interface 1: User request to the DBMS.

The user presents a query, the DBMS determines which physical DBs are needed to resolve the query

Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database.

Interface 3: The internal model access methods and OS access methods access the physical records of the database.

Page 37: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical File Design A Physical file is a portion of secondary

storage (disk space) allocated for the purpose of storing physical records

Pointers - a field of data that can be used to locate a related field or record of data

Access Methods - An operating system algorithm for storing and locating data in secondary storage

Pages - The amount of data read or written in one disk input or output operation

Page 38: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Internal Model Access Methods

Many types of access methods: Physical Sequential Indexed Sequential Indexed Random Inverted Direct Hashed

Differences in Access Efficiency Storage Efficiency

Page 39: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Physical Sequential

Key values of the physical records are in logical sequence

Main use is for “dump” and “restore” Access method may be used for

storage as well as retrieval Storage Efficiency is near 100% Access Efficiency is poor (unless fixed

size physical records)

Page 40: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Indexed Sequential Key values of the physical records are in logical

sequence Access method may be used for storage and

retrieval Index of key values is maintained with entries

for the highest key values per block(s) Access Efficiency depends on the levels of

index, storage allocated for index, number of database records, and amount of overflow

Storage Efficiency depends on size of index and volatility of database

Page 41: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Index SequentialData File

Block 1

Block 2

Block 3

AddressBlockNumber

1

2

3

ActualValue

Dumpling

Harty

Texaci

...

AdamsBecker

Dumpling

GettaHarty

MobileSunociTexaci

Page 42: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Indexed Sequential: Two Levels

Address

7

8

9

Key Value

385

678

805

001003

.

.150

705710

.

.785

251..

385

455480

.

.536

605610

.

.678

791..

805

Address

1

2

Key Value

150

385

Address

3

4

Key Value

536

678

Address

5

6

Key Value

785

805

Page 43: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Indexed Random Key values of the physical records are not

necessarily in logical sequence Index may be stored and accessed with

Indexed Sequential Access Method Index has an entry for every data base record.

These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence.

Access method may be used for storage and retrieval

Page 44: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Indexed Random

AddressBlockNumber

2

1

3

2

1

ActualValue

Adams

Becker

Dumpling

Getta

Harty

BeckerHarty

AdamsGetta

Dumpling

Page 45: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

BtreeF | | P | | Z |

R | | S | | Z |H | | L | | P |B | | D | | F |

Devils

AcesBoilersCars

MinorsPanthers

Seminoles

Flyers

HawkeyesHoosiers

Page 46: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Inverted Key values of the physical records are

not necessarily in logical sequence Access Method is better used for

retrieval An index for every field to be inverted

may be built Access efficiency depends on number

of database records, levels of index, and storage allocated for index

Page 47: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Inverted

AddressBlockNumber

1

2

3

ActualValue

CH 145

CS 201

CS 623

PH 345

CH 145101, 103,104

CS 201102

CS 623

105, 106

Adams

Becker

Dumpling

Getta

Harty

Mobile

Studentname

CourseNumber

CH145

cs201

ch145

ch145

cs623

cs623

Page 48: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Direct Key values of the physical records are

not necessarily in logical sequence There is a one-to-one correspondence

between a record key and the physical address of the record

May be used for storage and retrieval Access efficiency always 1 Storage efficiency depends on density of

keys No duplicate keys permitted

Page 49: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Hashing Key values of the physical records are not

necessarily in logical sequence Many key values may share the same physical

address (block) May be used for storage and retrieval Access efficiency depends on distribution of

keys, algorithm for key transformation and space allocated

Storage efficiency depends on distibution of keys and algorithm used for key transformation

Page 50: Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual

Comparative Access Methods

IndexedNo wasted space for databut extra space for index

Moderately Fast

Moderately FastVery fast with multiple indexesOK if dynamic OK if dynamic

Easy but requiresMaintenance ofindexes

FactorStorage spaceSequential retrieval on primary keyRandom Retr.Multiple Key Retr.Deleting records

Adding records

Updating records

SequentialNo wasted space

Very fast

ImpracticalPossible but needsa full scancan create wasted spacerequires rewriting fileusually requires rewriting file

Hashedmore space needed foraddition and deletion ofrecords after initial load

Impractical

Very fast

Not possiblevery easy

very easy

very easy