cs4432: database systems ii data storage 1. storage in dbmss dbmss manage large amounts of data how...
TRANSCRIPT
CS4432: Database Systems II
Data Storage
1
Storage in DBMSs• DBMSs manage large amounts of data
• How does a DBMS store and manage large amounts of data?– Has significant impact on performance
• Design decisions:– What representations and data structures best support efficient
manipulations of this data?
• To understand why the DBMSs applies specific strategies – Must first understand how disks work
2
Disks and Files
• DBMS stores information on (“hard”) disks.• Main memory is only for processing
• This has major implications for DBMS design!– READ: transfer data from disk to main memory
(RAM).– WRITE: transfer data from RAM to disk.– Both are high-cost operations, relative to
in-memory operations, so must be planned carefully!
3
DBMS vs. OS? Who’s in Control
• DBMS is in control of managing its data– It knows more about structure– It knows more about access pattern
4
That is why DBMS has Storage Manager
& Buffer Manager
5
Understanding Disks
6
Storage Hierarchy
Cache (all levels)
Main Memory
Secondary Storage
Tertiary Storage
Fastest
SlowestAvg. Size: 256kb-1MB
Read/Write Time: 10-8 seconds.
Random Access
Smallest of all memory, and also the most costly.
Usually on same chip as processor.
Easy to manage in Single Processor Environments, more complicated in Multiprocessor Systems.
Avg. Size: 128 MB – 1 GB
Read/Write Time: 10-7 to 10-8 seconds.
Random Access
Becoming more affordable.
Volatile
Avg. Size: 30GB-160GB
Read/Write Time: 10-2 seconds
NOT Random Access
Extremely Affordable: $0.68/GB!!!
Can be used for File System, Virtual Memory, or for raw data access.
Blocking (need buffering)
Avg. Size: Gigabytes-Terabytes
Read/Write Time: 101 - 102 seconds
NOT Random Access, or even remotely close
Extremely Affordable: pennies/GB!!!
Not efficient for any real-time database purposes, could be used in an offline processing environment
7
Storage Hierarchy
8
Memory Hierarchy Summary
10-9 10-6 10-3 10-0 103
access time (sec)
1015
1013
1011
109
107
105
103
cache
electronicmain
electronicsecondary
magneticopticaldisks
onlinetape
nearlinetape &opticaldisks
offlinetape
typi
cal c
apac
ity
(byt
es)
9
Memory Hierarchy Summary
10-9 10-6 10-3 10-0 103
access time (sec)
104
102
100
10-2
10-4
cache
electronicmain
electronicsecondary magnetic
opticaldisks
onlinetape
nearlinetape &opticaldisks
offlinetape
doll
ars/
MB
10
Why Not Store Everything in Main Memory?
• Costs too much. $100 will buy you either 16GB of RAM or 360GB of disk today.
• Main memory is volatile. We want data to be saved between runs. (Obviously!)
• Typical hierarchy:– Main memory (RAM) Processing– Disks (secondary storage) Persistent Storage– Tapes & DVDs Archival
11
MotivationConsider the following algorithm :
For each tuple r in relation R{Read the tuple rFor each tuple s in relation S{
read the tuple s append the entire tuple s to r
}}
What is the time complexity of this algorithm?
12
Motivation• Complexity:– This algorithm is O(n2) ! Is it always ?– Yes, if we assume random access of data.
• Hard disks are not efficient in Random Access !
• Unless organized efficiently, this algorithm may be much worse than O(n2).
13
Disks: Some Facts
• Data is stored and retrieved in units called disk blocks. – Disk block 512 bytes to 4K or 8K
• Movement to main-memory–Must read or write one block at a time
14
Disk Components
Platter (2 surface)
15
Virtual CylinderDisk Head
Platter
Cylinder
16
Tracks divided into Sectors Track
Sector
Gap
Gaps ≈ 10%
Sectors ≈ 90%
17
Movements
• Arm moves in-out– Called seek time– Mechanical
• Platter rotates– Called latency time– Mechanical
18
Actual Disk
19
Disk Controller
Processor
Memory Disk Controller
......
Disk 1
Disk 2
1. Controls the mechanical movement
2. Transferring the data from disks to memory
3. Smart buffering and scheduling
20
How big is the disk if?
• There are 4 platters• There are 8192 tracks per surface• There are 256 sectors per track• There are 512 bytes per sector
Size = 2 * num of platters * tracks * sectors * bytes per sector
Size = 2 * 4* 8192 * 256 * 512
Size = 233 bytes / (1024 bytes/kb) /(1024 kb/MB) /(1024 MB/GB)
Size = 233 = 23 * 230 = 8GB
Remember 1kb = 1024 bytes, not 1000!
21
Scale of Bytes
22
More Disk Terminology
• Rotation Speed: – The speed at which the disk rotates: 5400RPM
• Number of Tracks: – Typically 10,000 to 15,000.
• Bytes per track: – ~105 bytes per track
23
Big Question: What about access time?
block xin memory
?
I wantblock X
Time = Disk Controller Processing Time + Disk Delay{seek & rotation} +
Transfer Time 24