cs 4432lecture #61 cs4432: database systems ii lecture #6 professor elke a. rundensteiner
Post on 20-Dec-2015
215 views
TRANSCRIPT
CS 4432 lecture #6 1
CS4432: Database Systems IILecture #6
Professor Elke A. Rundensteiner
CS 4432 lecture #6 2
PROJECT 1
• Project-1 teams assigned (my.wpi)
• If any issues, let us know sooner rather than later …
• Must state each member’s contributions …
CS 4432 lecture #6 3
Storage Layout :How to lay out data on disk. ( chapter 3)
CS 4432 lecture #6 4
Placing records into blocks
blocks ...
a file
assume fixedlength blocks
assume a single file (for now)
CS 4432 lecture #6 5
(1) separating records(2) spanned vs. unspanned(3) mixed record types – clustering(4) split records(5) sequencing(6) indirection
Options for storing records in blocks:
CS 4432 lecture #6 6
Fixed part in one block
Typically forhybrid format
Variable part in another block
(4) Split records
CS 4432 lecture #6 7
Block with fixed recs.
R1 (a)R1 (b)
Block with variable recs.
R2 (a)
R2 (b)
R2 (c)
CS 4432 lecture #6 8
• Ordering records in file (and block) by some key value– Sequential file ( - sequenced file)
• Why sequencing ?– Typically to make it possible to efficiently read
records in order
(5) Sequencing
CS 4432 lecture #6 9
Sequencing Options
(a) Next record physically contiguous
...
(b) Linked
What about INSERT/ DELETE ?
Next (R1)R1
R1 Next (R1)
CS 4432 lecture #6 10
(c) Overflow area
Recordsin sequence
R1
R2
R3
R4
R5
Sequencing Options
header
R2.1
R1.3
R4.7
CS 4432 lecture #6 11
• How does one refer to records?
• Problem: Records can be on disk or in (virtual) memory. Need common address, but have different physical locations.
(6) Indirection Addressing
Rx
Many options: Physical Indirect
CS 4432 lecture #6 12
Purely Physical Addressing
Device ID E.g., Record Cylinder #
Address = Track #( ID ) Block #
Offset in block
Block ID
CS 4432 lecture #6 13
Fully Indirect Addressing
Solution: Record ID (Oracle: ROWID) as global address, maintain a map table.
Map Tablerec ID r address
a
Physicaladdr.Rec ID
CS 4432 lecture #6 14
Tradeoff
Flexibility Costto move records of indirection(for deletions, insertions) (lookup)
What to do : Options inbetween ?
Physical Indirect
CS 4432 lecture #6 15
Ex #1 : Indirection in block
Block Header
A block: Free
space
R3
R4
R1 R2
CS 4432 lecture #6 17
Ex. #2 Use logical block #’s understood by file system
instead of direct disk accessREC ID File ID
Block # Record # or Offset
File ID, PhysicalBlock # Block ID
File System Map
CS 4432 lecture #6 18
(1) Separating records(2) Spanned vs. Unspanned(3) Mixed record types - Clustering(4) Split records(5) Sequencing(6) Indirection
Recap: Storing records in blocks
CS 4432 lecture #6 19
(1) Insertion/Deletion(2) Buffer Management(3) Comparison of Schemes
Other Topics in Chapter 3
CS 4432 lecture #6 20
Block
Deletion
Rx
CS 4432 lecture #6 21
Options:
(a) Deleted and immediately reclaim space(b) Mark deleted
– May need chain of deleted records
(for re-use)– Need a way to mark:
• special characters• delete field• in map
CS 4432 lecture #6 22
As usual, many tradeoffs...
• How expensive is to move valid record to free space for immediate reclaim?
• How much space is wasted?– e.g., deleted records, delete fields,
free space chains,...
CS 4432 lecture #6 23
Dangling pointers
Note: If pointers point to physical locations (rather than ROWIDs), storing new data in deleted block corrupts data.
Concern with deletions
R1 ?
CS 4432 lecture #6 24
Solution #1: Do not worry
CS 4432 lecture #6 25
E.g., Leave “MARK” in map or old location
Solution #2: Tombstones
• Physical IDs
A block
This space This space cannever re-used be re-used
CS 4432 lecture #6 26
• Logical IDs
ID LOC
7788
map
Never reuseID 7788 nor
space in map...
E.g., Leave “MARK” in map or old location
Solution #2: Tombstones
CS 4432 lecture #6 27
• Place record ID within every record• When you follow a pointer, check if
it leads to correct record
Solution #3 (?):
Does this work??? If space reused, won’t new record have same ID?
to3-77
rec-id:3-77
CS 4432 lecture #6 28
Easy case: Records fixed length/not in sequence Insert new record at end of file
or, in deleted slotA little harder:
If records are variable size, not as easy may not be able to reuse space – fragmentationHard case: records in sequence
If free space “close by”, not too bad... Or, use overflow idea...
Or worst case, reorganize file ...
Insert
CS 4432 lecture #6 29
Interesting problems:
• How much free space to leave in each block, track, cylinder?
• How often do I reorganize file + overflow?
Freespace
CS 4432 lecture #6 30
• DB features needed• Why LRU may be bad Read• Pinned blocks Textbook!• Forced output• Double buffering• Swizzling
Buffer Management
in Notes03
CS 4432 lecture #6 31
Pointer Swizzling
Memory Disk
Rec A
block 1
Rec Ablock 2 block 2
block 1
Issue : If records (objects) contain pointers to otherobjects, translate locations when load objects into memory.
CS 4432 lecture #6 32
Translation DB Addr Mem Addr Table Rec-A Rec-A-inMem
One Option:
Solution: Insert fields that represent pointers into map table. Translate pointers as needed.
CS 4432 lecture #6 33
In memory pointers - need “type” bit
to disk
to memoryM
Another Option:
CS 4432 lecture #6 34
Swizzling Issues
• Automatic• On-demand• No swizzling / program control
Swizzling Options
• Must ‘unswizzle’
• Updating/writing of records
CS 4432 lecture #6 35
• There are 1,000,001 ways to organize my data on disk…
Which is right for me?
Comparison
CS 4432 lecture #6 36
Issues:
Flexibility Space Utilization
Complexity Performance
CS 4432 lecture #6 37
To evaluate a given strategy, compute following parameters:
-> space used for expected data - on average
-> expected time to :- fetch record given key- fetch record with next key- insert/delete/update record- read complete file- reorganize file (maybe sort)
-> usage patterns / workload: - how many/which user queries/updates
CS 4432 lecture #6 38
Example : What about Project 1?
Design decisions for CAPE queue storage system? – What data types?– Variable/Fixed length records?– Fixed/Variable format?– Record IDs ?– Sequencing? What’s special here ?– Deletions?– Insertions? – Headers/Meta-data for Queues?– Buffering/Disk Pushing?– Reorganize files and Overflow? – Defragmentation?– …
CS 4432 lecture #6 39
Chapter 4 (Chapter 13 in ‘compete book’ )
How to find a record quickly, given a key
TOMORROW