Download - Inside database
Inside Database System
Takashi HOSHINOCybozu Labs
1
Overview
• Control/data Flow• DBMS– Query Processor– Storage Engine• Transaction Management• Buffer Cache Management• Data Structures
• Storage
2
Control/Data Flow
ApplicationApplication
DBMSDBMS
OSOS
StorageStorage
SQL/Records
RW/Blocks
3
DBMS
Query ProcessorQuery Processor
Storage EngineStorage Engine
4
Query Processor
Hector 2010
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answer
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
5
Query Plan Example
Hector 2010
B,D
R.A = “c” S.E = 2
R S
natural join
6
Which Plan is Good?
Hector 2010
R S
T
T R
S
S T
R
7
Storage Engine
TransactionManagementTransaction
ManagementBuffer CacheManagementBuffer CacheManagement
Data StructuresData Structures
8
Transaction Management
• Keep ACID property of data– Atomicity– Consistency– Isolation– Durability
• Concurrency Control• Logging & Recovery
9
Concurrency Control by Locking
• Target resources– Database– Table– Block– Record
• Locking algorithm– Shard/exclusive lock– Intention lock for fine granularity
10
Shared/Exclusive Lock
• S: shared lock for read• X: exclusive lock for write
SS
Trn 1Trn 1
Trn 2Trn 2
Trn 3Trn 3
XX
Trn 1Trn 1
Trn 2Trn 2
Trn 3Trn 3
11
Intention Lock
O O O _
_
_
____
O_O
_OO
IS IX S X
IS
IXSX
http://dev.mysql.com/doc/refman/5.5/en/innodb-lock-modes.html
IXIX
IXIX ISIS
XX SS
12
Logging with Redo Log
Hector 2010
T1: Read(A,t); t t2; write (A,t); Read(B,t); t t2; write (B,t);
Output(A); Output(B)
A: 8B: 8
A: 8B: 8
memory DB
LOG
1616
<T1, start><T1, A, 16><T1, B, 16>
<T1, commit>
<T1, end>
output
1616
13
Buffer Cache Management
• Allowance of dirty cache– No: write through– Yes: write back
• Eviction strategy– LRU: least recently used– …
• Prefetch– Sequential– …
14
Data Structures
DictionaryDictionary
TableTable
IndexIndex …
LogLogLogLogLogLog
TableTable
IndexIndex
StatisticsStatistics
15
Inside Data Block
R3
R4
R1 R2
Hector 2010
Header
Free space
16
Structures for Index
Hash FunctionHash Function
Tree Hash
17
B+tree Example
Hector 2010
Root
100
120
150
180
30
3 5 11 30 35 100
101
110
120
130
150
156
179
180
200
18
Hash Example
Hector 2010
INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0
0
1
2
3
d
a
c
b
h(e) = 1
e
19
Tree vs Hash for Indexing
• Tree– O(log N) for single record retrieval– Efficient range scan is available
• Hash– O(1) for single record retrieval– Range scan is not supported
20
Storage
Hard Disk Drive Solid State Drive
RAID StorageStorage Unit
ControllerControllerCacheCache
SCSI Protocol Stack/HBA DriversSCSI Protocol Stack/HBA Drivers
Buffer Cache ManagerBuffer Cache ManagerFile SystemFile System
Logical Unit/Software RAID ManagerLogical Unit/Software RAID Manager
OS Functionalities for Storage IO
ControllerControllerCacheCache
ControllerControllerCacheCache
21
Hard Disk Drive
TrackSector
Disk Platter
transferrotationheadseekaccess TTTT
Lseek size lseek size
Small lseek Large lseek (smoothed)
IO R
espo
nse
IO R
espo
nse
22
Summary
• DBMS– Query Processor– Storage Engine
• Storage
23
References
• Database System ImplementationLecture notes at Stanford University.– http://infolab.stanford.edu/~ullman/dbsi.html
• MySQL InnoDB Internal – http://www.innodb.com/wp/wp-content/uploads/
2009/05/innodb-file-formats-and-source-code-structure.pdf
• MySQL Reference Manual– http://dev.mysql.com/doc/
24
For Further Study
• Fundamentals of Database systems– http://www.amazon.com/Fundamentals-Database-
Systems-Ramez-Elmasri/dp/0136086209
• Books recommended by Leo’s Chronicle– http://leoclock.blogspot.com/2009/01/blog-post_07.html
25
Fundamentals of Database Systems
26