data storage john ortiz. lecture 17data storage2 overview database stores data on secondary storage...

33
Data Storage John Ortiz

Upload: darrius-paskin

Post on 15-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Data Storage

John Ortiz

Page 2: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 2

Overview Database stores data on secondary storage Disk has distinct storage and access

characteristics DBMS must bring data from disk to main

memory buffer for processing & write data from memory buffer to disk for storage

Low level DBMS software must effectively manage disk and buffer space

How disks are accessed? How to manage a buffer?

Page 3: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 3

Storage Hierarchy Architecture of a typical computer

Processor: fast, slow, RISC, cache, pipelined. Typical speed: 100 500 1000 MIPS

Main Memory: fast, slow, volatile, read-only Access time: 1s – 1ns

P

MSecondaryStorage

C

Page 4: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 4

Secondary Storage Devices Disk:

Floppy (hard, soft) Removable Packs Winchester Ram disks Optical, CDROM, DVD ROM Disk Arrays

Tape Real, cartridge, robots

Page 5: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 5

Disks DBMS stores information on (“hard”) disks. This has major implications for DBMS

design! Data must be transferred from disk to

main memory (for read) & vice versa (for write)

Transfer unit: block (= 1 or more sectors) Why not store everything in main memory?

Costs too much. $147 will buy you 128MB of RAM or 40GB of disk (Oct. 2000).

Main memory is volatile. We want data to be saved between runs.

Page 6: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 6

Components of A Disk

Platters

Spindle

Disk head

Arm movement

Arm assembly

Tracks

Sector

Top view

Page 7: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 7

Components of A Disk (cont.) Terms: Platter, Head, Actuator, Cylinder,

Track, Sector, Block (Page) , Gap, Cluster Common measurement

Diameter: 1 inch 15 inches Cylinders:100 16383 Surfaces: 1 (CDs) 10 Tracks/cyl: 2 (floppies) 30 Sector Size: 512B 64K Capacity: 360 KB (old floppy) 200 GB

Page 8: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 8

Components of A Disk (cont.) Division of tracks into sectors is hard coded

on disk surface A sector is subdivided into one or more

blocks Formatting sets the block size Interblock gaps make the difference between

formatted and unformatted capacity

Page 9: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 9

block xin memory

?

I wantblock X

Disk Access Time

Access Time = Seek Time + Rotational Delay + Transfer Time + Other

Page 10: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 10

Seek Time Time to move arms to position disk head on

a given track

3 or 5x

x

1 N

Cylinders Traveled

Time

Page 11: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 11

Average Random Seek Time

Typical S: 8 ms 40 ms N = number of tracks/surface (= #

cylinders)

)(

)(

1

1 1

NN

jiSeektime

S

N

i

N

ijj

Page 12: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 12

Rotational Delay

Time to wait for block to rotate under head also called rotational latency Average R = 1/2 revolution Typically R = 4.2 ms (7200 RPM)

Complication: may need to wait for track start

Head HereStart Track

Block I Want

Page 13: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 13

Transfer Time Time for actually moving data to/from disk

surface Typical transfer rate tr: 16 100 MB/s Block Transfer Time btt = block size / tr

Typically, 4 KB: about 1 ms Other Delay

CPU time to issue I/O Contention for controller Contention for bus, memoryTypical value: 0

Page 14: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 14

Formulas tr = track size * rpm(transfer rate) rd = ½ * 1/rpm (rotational delay) btt = B/tr (block transfer time) B = Block Size btr = ( B/(B + G) * tr) (bulk transfer

rate) G = interblock gap size Read ‘K’ consecutive blocks:

s + rd + (B/btr) * K Read ‘K’ random blocks

(s + rd + (B/btr) ) * K

Page 15: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 15

Formulas Log base N of X = log X/log N If no base indicated, assume base 10 Log2 500 = log 500/log 2 bfr = floor(B/record size) File Size (in blocks) = ceiling (# records/bfr)

(unspanned) = ceiling(# bytes in file/B) (spanned)

Page 16: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 16

Units!!! The units are just as important as the value Use the units to check your result Example: tr = track size * rpm Given track size = 32768 bytes and 3600

rpm, what is the transfer rate? 32768 * 3600 = 117,964,800 which SEEMS

very unreasonable! However, it is 117,964,800 bytes per

minute! 1,966,080 bytes per second, = 1.875

MB/sec Pretty Slow for a modern disk!

Page 17: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 17

How many bytes? 1 kilobyte = 1024 bytes (most of the time!) 1 megabyte = 10242 = 1,048,576 bytes 1 gigabyte = 10243 bytes 1 terabyte = 10244 bytes Disk manufacturers often use 1000 bytes as

a kilobyte for marketing purposes! My 200 GB disk drive, is actually 186 GB,

though it’s 200 billion bytes RAM marketing has so far not followed suit.

Page 18: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 18

What about Reading Next Block? If we do things right (double buffer, stagger

blocks …) Time = T + Negligible Skip gap, change track, change cylinder

Rule of Thumb: Random I/O: Expensive Sequential I/O: Much lessEX: For 1 KB block

Random I/O : about 20ms Sequential I/O: about 1ms

Page 19: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 19

What About Writing and Updating? Cost of writing is similar to reading, unless

we need to verify Add full rotation + T

Cost of modifying a block:(a) Read Block(b) Modify in Memory(c) Write Block[(d) Verify?]

Block Address: Device, Cylinder #, Surface #, Sector

Page 20: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 20

Disk Example: IBM Ultrastar 36LP Formatted capacity: 36.9 GB Sector size: 512 to 528 variable (2-byte inc) Platters: 10 Max. recording density: 350000 BPI Track density: 18400 TPI (per surface) Rotation speed: 7200 RPM Average rotational delay: 4.17 ms Sustained data rate: 19.5-31.9 MB Seek time: average 6.8ms, next track 0.6ms

Page 21: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 21

Arranging Pages on Disk Blocks in a file should be arranged

sequentially on disk (by next block), to minimize seek and rotational delay.

The concept of next block : 1. Next block on the same track2. 1st block on next track of the same

cylinder3. 1st block on 1st track of the next cylinder

For a sequential scan, pre-fetching several pages at a time is a big win!

Page 22: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 22

Disk Space Management Lowest layer of DBMS software manages

space on disk. Higher levels call upon this layer to:

allocate/de-allocate a page read/write a page

One such “higher level” is the buffer manager, which receives a request to bring a page into memory and then, if needed, requests the disk space layer to read the page into the buffer pool.

Page 23: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 23

Buffer Management in a DBMS

Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained.

DB

MAIN MEMORY

DISK

disk page

free frame

Page Requests from Higher Levels

BUFFER POOL

choice of frame dictatedby replacement policy

Page 24: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 24

When a Page is Requested ... If requested page is not in pool:

Choose a frame for replacement If the frame is dirty, write it to disk Read requested page into chosen frame

Else pin the frame (so it can not be replaced)

Return its address. If requests can be predicted (e.g.,

sequential scans) pages can be pre-fetched several pages at a time!

Page 25: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 25

Buffer Replacement Policy A frame is chosen for replacement by a

replacement policy: Least-recently-used (LRU), Clock, MRU,

etc. Policy can have big impact on # of I/O’s;

depends on the access pattern. Sequential flooding: Nasty situation caused

by LRU + repeated sequential scans. # buffer frames < # pages in file means

each page request causes an I/O. MRU much better in this situation (but not in all situations, of course).

Page 26: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 26

Representing Records A record is a collection of related data items

(called fields) Fields of an Employee record: id, name, salary, date-of-hire, ...

Main choices: Format: fixed vs variable Length: fixed vs variable

A schema (of record) specifies# of fields, type of each field, order in record, meaning of each field

Page 27: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 27

Example: Fixed Format and Length Employee record

(1) Eid, 2 byte integer(2) Ename, 10 char. Schema(3) Dept, 2 byte code

55 s m i t h 02

83 j o n e s 01

Records

Page 28: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 28

File of Records Logically, a file is a sequence of records

Physically, a file is a set of blocks

R1 R2 R3 R4 Rn

assume fixedlength blocks

assume a single file (for now)a file

Page 29: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 29

Packing Records into Blocks Unspanned: records must be within one

block

Spanned: A record may be split between blocks

R1 R2 R3 R4 R5

Block 1 Block 2

R1 R2 R3(a)

R3(b) R6R5R4 R7

(a)

Block 1 Block 2

Unspanned: Simple, but may waste space Spanned: Necessary if record size > block

size

Page 30: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 30

File Size: Spanned vs Unspanned Blocking Factor (bfF): number of records per

block. Ex: Block size B = 2048 bytes

Number of records R = 200,000 Record size = 600 bytes (fixed length) Blocking factor = 2048/600 = 3 RPB (unspanned, 248 bytes unused)File size = 200000/3 = 66667 blocks

(unspan.) = 200000600/2048 = 58594

(span.)

Page 31: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 31

Summary Disks provide cheap, non-volatile storage.

Random access, but cost depends on location of page on disk; important to arrange data sequentially to minimize seek and rotation delays.

Buffer manager brings pages into RAM. Page stays in RAM until released by

requestor(s).

Page 32: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 32

Summary (cont.) Written to disk when frame chosen for

replacement (which is after all requestors release the page), or earlier.

Choice of frame to replace based on replacement policy.

File layer keeps track of pages in a file, and supports abstraction of a collection of records.

Page 33: Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics

Lecture 17 Data Storage 33

Look Ahead Next topic: Hashing and Indexing Read textbook:

Chapter 6.1-6.3