cs 245: database system principles - emory...
TRANSCRIPT
![Page 1: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/1.jpg)
CS 245 Notes 3 1
CS 245: Database System Principles
Notes 03: Disk Organization
Hector Garcia-Molina
![Page 2: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/2.jpg)
CS 245 Notes 3 2
• How to lay out data on disk
• How to move it to memory
Topics for today
![Page 3: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/3.jpg)
CS 245 Notes 3 3
What are the data items we want to store?
• a salary
• a name
• a date
• a picture
![Page 4: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/4.jpg)
CS 245 Notes 3 4
What are the data items we want to store?
• a salary
• a name
• a date
• a picture
What we have available: Bytes
8 bits
![Page 5: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/5.jpg)
CS 245 Notes 3 5
To represent:
• Integer (short): 2 bytes
e.g., 35 is
00000000 00100011
• Real, floating point n bits for mantissa, m for exponent….
![Page 6: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/6.jpg)
CS 245 Notes 3 6
• Characters
various coding schemes suggested,
most popular is ascii
To represent:
Example: A: 1000001 a: 1100001 5: 0110101 LF: 0001010
![Page 7: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/7.jpg)
CS 245 Notes 3 7
• Boolean
e.g., TRUE FALSE
1111 1111
0000 0000
To represent:
• Application specific e.g., RED 1 GREEN 3 BLUE 2 YELLOW 4 …
![Page 8: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/8.jpg)
CS 245 Notes 3 8
• Boolean
e.g., TRUE FALSE
1111 1111
0000 0000
To represent:
• Application specific e.g., RED 1 GREEN 3 BLUE 2 YELLOW 4 …
Can we use less than 1 byte/code?
Yes, but only if desperate...
![Page 9: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/9.jpg)
CS 245 Notes 3 9
• Dates
e.g.: - Integer, # days since Jan 1, 1900
- 8 characters, YYYYMMDD
- 7 characters, YYYYDDD
(not YYMMDD! Why?)
• Time
e.g. - Integer, seconds since midnight
- characters, HHMMSSFF
To represent:
![Page 10: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/10.jpg)
CS 245 Notes 3 10
• String of characters
– Null terminated
e.g.,
– Length given
e.g.,
- Fixed length
c t a
c t a 3
To represent:
![Page 11: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/11.jpg)
CS 245 Notes 3 11
• Bag of bits
Length Bits
To represent:
![Page 12: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/12.jpg)
CS 245 Notes 3 12
Key Point
• Fixed length items
• Variable length items - usually length given at beginning
![Page 13: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/13.jpg)
CS 245 Notes 3 13
• Type of an item: Tells us how to
interpret
(plus size if fixed)
Also
![Page 14: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/14.jpg)
CS 245 Notes 3 14
Data Items
Records
Blocks
Files
Memory
Overview
![Page 15: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/15.jpg)
CS 245 Notes 3 15
Record - Collection of related data items (called FIELDS)
E.g.: Employee record:
name field,
salary field,
date-of-hire field, ...
![Page 16: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/16.jpg)
CS 245 Notes 3 16
Types of records:
• Main choices:
– FIXED vs VARIABLE FORMAT
– FIXED vs VARIABLE LENGTH
![Page 17: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/17.jpg)
CS 245 Notes 3 17
A SCHEMA (not record) contains
following information
- # fields
- type of each field
- order in record
- meaning of each field
Fixed format
![Page 18: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/18.jpg)
CS 245 Notes 3 18
Example: fixed format and length
Employee record
(1) E#, 2 byte integer
(2) E.name, 10 char. Schema
(3) Dept, 2 byte code
55 s m i t h 02
83 j o n e s 01
Records
![Page 19: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/19.jpg)
CS 245 Notes 3 19
• Record itself contains format
“Self Describing”
Variable format
![Page 20: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/20.jpg)
CS 245 Notes 3 20
Example: variable format and length
4 I 5 2 4 S D R O F 46
Field name codes could also be strings, i.e. TAGS
# F
ield
s
Code identify
ing
f
ield
as
E#
In
teger
type
Code for
Enam
e
Str
ing t
ype
Length
of st
r.
![Page 21: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/21.jpg)
CS 245 Notes 3 21
Variable format useful for:
• “sparse” records
• repeating fields
• evolving formats
But may waste space...
![Page 22: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/22.jpg)
CS 245 Notes 3 22
• EXAMPLE: var format record with
repeating fields
Employee one or more children
3 E_name: Fred Child: Sally Child: Tom
![Page 23: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/23.jpg)
CS 245 Notes 3 23
Note: Repeating fields does not imply
- variable format, nor
- variable size
John Sailing Chess --
![Page 24: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/24.jpg)
CS 245 Notes 3 24
Note: Repeating fields does not imply
- variable format, nor
- variable size
John Sailing Chess --
• Key is to allocate maximum number of
repeating fields (if not used null)
![Page 25: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/25.jpg)
CS 245 Notes 3 25
Many variants between fixed - variable format:
Example: Include record type in record
record type record length
tells me what
to expect
(i.e. points to schema)
5 27 . . . .
![Page 26: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/26.jpg)
CS 245 Notes 3 26
Record header - data at beginning that describes record
May contain:
- record type
- record length
- time stamp
- other stuff ...
![Page 27: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/27.jpg)
CS 245 Notes 3 27
Next: placing records into blocks
blocks ...
a file
![Page 28: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/28.jpg)
CS 245 Notes 3 28
Next: placing records into blocks
blocks ...
a file
assume fixed
length blocks
assume a single file (for now)
![Page 29: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/29.jpg)
CS 245 Notes 3 29
(1) separating records
(2) spanned vs. unspanned
(3) sequencing
(4) indirection
Options for storing records in blocks:
![Page 30: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/30.jpg)
CS 245 Notes 3 30
Block
(a) no need to separate - fixed size recs.
(b) Separate using special marker
(c) Separate using record lengths (or offsets)
- within each record
- in block header
(1) Separating records
R2 R1 R3
![Page 31: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/31.jpg)
CS 245 Notes 3 31
• Unspanned: records must be within one block
block 1 block 2
...
• Spanned block 1 block 2
...
(2) Spanned vs. Unspanned
R1 R2
R1
R3 R4 R5
R2 R3 (a)
R3 (b)
R6 R5 R4 R7 (a)
![Page 32: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/32.jpg)
CS 245 Notes 3 32
need indication need indication
of partial record of continuation
“pointer” to rest (+ from where?)
of the record
R1 R2 R3 (a)
R3 (b)
R6 R5 R4 R7 (a)
With spanned records:
![Page 33: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/33.jpg)
CS 245 Notes 3 33
• Unspanned is much simpler, but may waste space…
• Spanned essential if
record size > block size
Spanned vs. unspanned:
![Page 34: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/34.jpg)
CS 245 Notes 3 34
• Ordering records in file (and block) by some key value
Sequential file ( sequenced)
(3) Sequencing
![Page 35: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/35.jpg)
CS 245 Notes 3 35
Why sequencing?
Typically to make it possible to efficiently read records in order
(e.g., to do a merge-join — discussed later)
![Page 36: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/36.jpg)
CS 245 Notes 3 36
Sequencing Options
(a) Next record physically contiguous
...
(b) Linked
Next (R1) R1
R1 Next (R1)
![Page 37: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/37.jpg)
CS 245 Notes 3 37
(c) Overflow area
Records
in sequence
R1
R2
R3
R4
R5
Sequencing Options
![Page 38: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/38.jpg)
CS 245 Notes 3 38
(c) Overflow area
Records
in sequence
R1
R2
R3
R4
R5
Sequencing Options
header
R2.1
R1.3
R4.7
![Page 39: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/39.jpg)
CS 245 Notes 3 39
• How does one refer to records?
(4) Indirection
Rx
![Page 40: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/40.jpg)
CS 245 Notes 3 40
• How does one refer to records?
(4) Indirection
Rx
Many options:
Physical Indirect
![Page 41: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/41.jpg)
CS 245 Notes 3 41
Purely Physical
Device ID
E.g., Record Cylinder #
Address = Track #
or ID Block #
Offset in block
Block ID
![Page 42: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/42.jpg)
CS 245 Notes 3 42
Fully Indirect
E.g., Record ID is arbitrary bit string
map
rec ID
r address
a Physical
addr. Rec ID
![Page 43: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/43.jpg)
CS 245 Notes 3 43
Tradeoff
Flexibility Cost
to move records of indirection
(for deletions, insertions)
![Page 44: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/44.jpg)
CS 245 Notes 3 44
Physical Indirect
Many options in between …
![Page 45: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/45.jpg)
CS 245 Notes 3 45
Example: Indirection in block
Header
A block: Free space
R3
R4
R1 R2
![Page 46: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/46.jpg)
CS 245 Notes 3 46
Block header - data at beginning that describes block
May contain: - File ID (or RELATION or DB ID)
- This block ID
- Record directory
- Pointer to free space
- Type of block (e.g. contains recs type 4;
is overflow, …)
- Pointer to other blocks “like it”
- Timestamp ...
![Page 47: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/47.jpg)
CS 245 Notes 3 47
(1) separating records
(2) spanned vs. unspanned
(3) sequencing
(4) indirection
Options for storing records in blocks:
![Page 48: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/48.jpg)
CS 245 Notes 3 48
Case Study: salesforce.com
• salesforce.com provides CRM services
• salesforce customers are tenants
• Tenants run apps and DBMS as service
tenant A
tenant B
tenant C
salesforce.com
data
CRM App
![Page 49: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/49.jpg)
CS 245 Notes 3 49
Options for Hosting
• Separate DBMS per tenant
• One DBMS, separate tables per tenant
• One DBMS, shared tables
![Page 50: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/50.jpg)
CS 245 Notes 3 50
Tenants have similar data
customer A B C D E F a1 b1 c1 d1 e1 - a2 b2 c2 - e2 f2
customer A B C D G a3 b3 c2 - - a1 b1 c1 - g1 a4 - - d1
tenant 1:
tenant 2:
![Page 51: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/51.jpg)
CS 245 Notes 3 51
salesforce.com solution
customer tenant A B C 1 a1 b1 c1 1 a2 b2 c2 2 a3 b3 c2 2 a1 b1 c1
cust-other tenant A f1 v1 f2 v2 ... 1 a1 D d1 E e1 1 a2 E e2 F f2 2 a1 G g1 3 a4 D d1
fixed schema for all tenants
var schema for all tenants
![Page 52: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/52.jpg)
CS 245 Notes 3 52
(1) Insertion/Deletion
(2) Buffer Management
(3) Comparison of Schemes
Other Topics
![Page 53: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/53.jpg)
CS 245 Notes 3 53
Block
Deletion
Rx
![Page 54: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/54.jpg)
CS 245 Notes 3 54
Options:
(a) Immediately reclaim space
(b) Mark deleted
![Page 55: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/55.jpg)
CS 245 Notes 3 55
Options:
(a) Immediately reclaim space
(b) Mark deleted
– May need chain of deleted records
(for re-use)
– Need a way to mark:
• special characters
• delete field
• in map
![Page 56: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/56.jpg)
CS 245 Notes 3 56
As usual, many tradeoffs...
• How expensive is to move valid record to free space for immediate reclaim?
• How much space is wasted?
– e.g., deleted records, delete fields, free space chains,...
![Page 57: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/57.jpg)
CS 245 Notes 3 57
Dangling pointers
Concern with deletions
R1 ?
![Page 58: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/58.jpg)
CS 245 Notes 3 58
Solution #1: Do not worry
![Page 59: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/59.jpg)
CS 245 Notes 3 59
E.g., Leave “MARK” in map or old location
Solution #2: Tombstones
![Page 60: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/60.jpg)
CS 245 Notes 3 60
E.g., Leave “MARK” in map or old location
Solution #2: Tombstones
• Physical IDs
A block
This space This space can never re-used be re-used
![Page 61: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/61.jpg)
CS 245 Notes 3 61
• Logical IDs
ID LOC
7788
map
Never reuse ID 7788 nor
space in map...
E.g., Leave “MARK” in map or old location
Solution #2: Tombstones
![Page 62: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/62.jpg)
CS 245 Notes 3 62
Easy case: records not in sequence
Insert new record at end of file or in deleted slot
If records are variable size, not as easy...
Insert
![Page 63: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/63.jpg)
CS 245 Notes 3 63
Hard case: records in sequence
If free space “close by”, not too bad...
Or use overflow idea...
Insert
![Page 64: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/64.jpg)
CS 245 Notes 3 64
Interesting problems:
• How much free space to leave in each block, track, cylinder?
• How often do I reorganize file + overflow?
![Page 65: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/65.jpg)
CS 245 Notes 3 65
Free space
![Page 66: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/66.jpg)
CS 245 Notes 3 66
• DB features needed
• Why LRU may be bad Read
• Pinned blocks Textbook!
• Forced output
• Double buffering
• Swizzling
Buffer Management
in Notes02
![Page 67: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/67.jpg)
CS 245 Notes 3 67
Swizzling
Memory Disk
Rec A
block 1
block 2
block 1
![Page 68: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/68.jpg)
CS 245 Notes 3 68
Swizzling
Memory Disk
Rec A
block 1
Rec A block 2 block 2
block 1
![Page 69: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/69.jpg)
CS 245 Notes 3 69
Row vs Column Store
• So far we assumed that fields of a record are stored contiguously (row store)...
• Another option is to store like fields together (column store)
![Page 70: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/70.jpg)
CS 245 Notes 3 70
• Example: Order consists of
– id, cust, prod, store, price, date, qty
Row Store
id1 cust1 prod1 store1 price1 date1 qty1
id2 cust2 prod2 store2 price2 date2 qty2
id3 cust3 prod3 store3 price3 date3 qty3
![Page 71: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/71.jpg)
CS 245 Notes 3 71
• Example: Order consists of
– id, cust, prod, store, price, date, qty
Column Store
id1 cust1
id2 cust2
id3 cust3
id4 cust4
... ...
id1 prod1
id2 prod2
id3 prod3
id4 prod4
... ...
id1 price1 qty1
id2 price2 qty2
id3 price3 qty3
id4 price4 qty4
... ... ...
ids may or may not be stored explicitly
![Page 72: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/72.jpg)
CS 245 Notes 3 72
Row vs Column Store
• Advantages of Column Store
– more compact storage (fields need not start at byte boundaries)
– efficient reads on data mining operations
• Advantages of Row Store
– writes (multiple fields of one record)more efficient
– efficient reads for record access (OLTP)
![Page 73: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/73.jpg)
CS 245 Notes 3 73
Interesting paper to read:
• Mike Stonebreaker, Elizabeth (Betty) O'Neil, Pat O’Neil, Xuedong Chen, et al. " C-Store: A Column-oriented DBMS," Presented at the 31st VLDB Conference, September 2005.
• http://www.cs.umb.edu/%7Eponeil/ vldb05_cstore.pdf
![Page 74: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/74.jpg)
CS 245 Notes 3 74
• There are 10,000,000 ways to organize my data on disk…
Which is right for me?
Comparison
![Page 75: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/75.jpg)
CS 245 Notes 3 75
Issues:
Flexibility Space Utilization
Complexity Performance
![Page 76: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/76.jpg)
CS 245 Notes 3 76
To evaluate a given strategy, compute following parameters:
-> space used for expected data -> expected time to - fetch record given key
- fetch record with next key - insert record - append record - delete record - update record - read all file - reorganize file
![Page 77: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/77.jpg)
CS 245 Notes 3 77
Example
How would you design Megatron 3000 storage system? (for a relational DB, low end)
– Variable length records? – Spanned? – What data types? – Fixed format? – Record IDs ? – Sequencing? – How to handle deletions?
![Page 78: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/78.jpg)
CS 245 Notes 3 78
• How to lay out data on disk
Data Items
Records
Blocks
Files
Memory
DBMS
Summary
![Page 79: CS 245: Database System Principles - Emory Universitycheung/Courses/554/Syllabus/Slides/Notes03.pdfCS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina](https://reader031.vdocuments.net/reader031/viewer/2022042004/5e6f0ceaf382de6744014d4e/html5/thumbnails/79.jpg)
CS 245 Notes 3 79
How to find a record quickly,
given a key
Next