cs4432: database systems ii data storage 1. storage in dbmss dbmss manage large amounts of data how...

24
CS4432: Database Systems II Data Storage 1

Upload: thomas-cox

Post on 13-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

CS4432: Database Systems II

Data Storage

1

Page 2: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Storage in DBMSs• DBMSs manage large amounts of data

• How does a DBMS store and manage large amounts of data?– Has significant impact on performance

• Design decisions:– What representations and data structures best support efficient

manipulations of this data?

• To understand why the DBMSs applies specific strategies – Must first understand how disks work

2

Page 3: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Disks and Files

• DBMS stores information on (“hard”) disks.• Main memory is only for processing

• This has major implications for DBMS design!– READ: transfer data from disk to main memory

(RAM).– WRITE: transfer data from RAM to disk.– Both are high-cost operations, relative to

in-memory operations, so must be planned carefully!

3

Page 4: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

DBMS vs. OS? Who’s in Control

• DBMS is in control of managing its data– It knows more about structure– It knows more about access pattern

4

Page 5: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

That is why DBMS has Storage Manager

& Buffer Manager

5

Page 6: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Understanding Disks

6

Page 7: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Storage Hierarchy

Cache (all levels)

Main Memory

Secondary Storage

Tertiary Storage

Fastest

SlowestAvg. Size: 256kb-1MB

Read/Write Time: 10-8 seconds.

Random Access

Smallest of all memory, and also the most costly.

Usually on same chip as processor.

Easy to manage in Single Processor Environments, more complicated in Multiprocessor Systems.

Avg. Size: 128 MB – 1 GB

Read/Write Time: 10-7 to 10-8 seconds.

Random Access

Becoming more affordable.

Volatile

Avg. Size: 30GB-160GB

Read/Write Time: 10-2 seconds

NOT Random Access

Extremely Affordable: $0.68/GB!!!

Can be used for File System, Virtual Memory, or for raw data access.

Blocking (need buffering)

Avg. Size: Gigabytes-Terabytes

Read/Write Time: 101 - 102 seconds

NOT Random Access, or even remotely close

Extremely Affordable: pennies/GB!!!

Not efficient for any real-time database purposes, could be used in an offline processing environment

7

Page 8: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Storage Hierarchy

8

Page 9: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Memory Hierarchy Summary

10-9 10-6 10-3 10-0 103

access time (sec)

1015

1013

1011

109

107

105

103

cache

electronicmain

electronicsecondary

magneticopticaldisks

onlinetape

nearlinetape &opticaldisks

offlinetape

typi

cal c

apac

ity

(byt

es)

9

Page 10: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Memory Hierarchy Summary

10-9 10-6 10-3 10-0 103

access time (sec)

104

102

100

10-2

10-4

cache

electronicmain

electronicsecondary magnetic

opticaldisks

onlinetape

nearlinetape &opticaldisks

offlinetape

doll

ars/

MB

10

Page 11: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Why Not Store Everything in Main Memory?

• Costs too much. $100 will buy you either 16GB of RAM or 360GB of disk today.

• Main memory is volatile. We want data to be saved between runs. (Obviously!)

• Typical hierarchy:– Main memory (RAM) Processing– Disks (secondary storage) Persistent Storage– Tapes & DVDs Archival

11

Page 12: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

MotivationConsider the following algorithm :

For each tuple r in relation R{Read the tuple rFor each tuple s in relation S{

read the tuple s append the entire tuple s to r

}}

What is the time complexity of this algorithm?

12

Page 13: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Motivation• Complexity:– This algorithm is O(n2) ! Is it always ?– Yes, if we assume random access of data.

• Hard disks are not efficient in Random Access !

• Unless organized efficiently, this algorithm may be much worse than O(n2).

13

Page 14: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Disks: Some Facts

• Data is stored and retrieved in units called disk blocks. – Disk block 512 bytes to 4K or 8K

• Movement to main-memory–Must read or write one block at a time

14

Page 15: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Disk Components

Platter (2 surface)

15

Page 16: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Virtual CylinderDisk Head

Platter

Cylinder

16

Page 17: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Tracks divided into Sectors Track

Sector

Gap

Gaps ≈ 10%

Sectors ≈ 90%

17

Page 18: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Movements

• Arm moves in-out– Called seek time– Mechanical

• Platter rotates– Called latency time– Mechanical

18

Page 19: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Actual Disk

19

Page 20: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Disk Controller

Processor

Memory Disk Controller

......

Disk 1

Disk 2

1. Controls the mechanical movement

2. Transferring the data from disks to memory

3. Smart buffering and scheduling

20

Page 21: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

How big is the disk if?

• There are 4 platters• There are 8192 tracks per surface• There are 256 sectors per track• There are 512 bytes per sector

Size = 2 * num of platters * tracks * sectors * bytes per sector

Size = 2 * 4* 8192 * 256 * 512

Size = 233 bytes / (1024 bytes/kb) /(1024 kb/MB) /(1024 MB/GB)

Size = 233 = 23 * 230 = 8GB

Remember 1kb = 1024 bytes, not 1000!

21

Page 22: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Scale of Bytes

22

Page 23: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

More Disk Terminology

• Rotation Speed: – The speed at which the disk rotates: 5400RPM

• Number of Tracks: – Typically 10,000 to 15,000.

• Bytes per track: – ~105 bytes per track

23

Page 24: CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Big Question: What about access time?

block xin memory

?

I wantblock X

Time = Disk Controller Processing Time + Disk Delay{seek & rotation} +

Transfer Time 24