database architecture 2 & storage - stanford university€¦ · differentiating by workload two...
TRANSCRIPT
Database Architecture 2 & Storage
Instructor: Matei Zahariacs245.stanford.edu
Outline
Relational DBMS architecture
Alternative architectures & tradeoffs
Storage hardware
CS 245 2
Summary from Last Time
System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based optimizer» Lock manager» Recovery» View-based access control
CS 245 3
Differentiating by Workload
Two big classes of commercial RDBMS today
Transactional DBMS: focus on concurrent, small, low-latency transactions (e.g. MySQL, Postgres, Oracle, DB2) → real-time apps
Analytical DBMS: focus on large, parallel but mostly read-only analytics (e.g. Teradata, Redshift, Vertica) → “data warehouses”
CS 245 4
How To Design Components for Transactional vs Analytical DBMS?
Component Transactional DBMS
Analytical DBMS
Data storage B-trees, row oriented storage
Column-oriented storage
Locking Fine-grained, very optimized
Coarse-grained (few writes)
Recovery Log data writes, minimize latency
Log queries
CS 245 5
Outline
Relational DBMS architecture
Alternative architectures & tradeoffs
Storage hardware
CS 245 6
How Can We Change the DBMS Architecture?
CS 245 7
Decouple Query Processing from Storage ManagementExample: “data lake” architecture (Hadoop, S3, etc)
Large-scalefile systems or
blob storesGFS
File formats& metadata
Processingengines
MapReduce
CS 245 8
Decouple Query Processing from Storage ManagementPros:» Can scale compute independently of storage
(e.g. in datacenter or public cloud)» Let different orgs develop different engines» Your data is “open” by default to new tech
Cons:» Harder to guarantee isolation, reliability, etc» Harder to co-optimize compute and storage» Can’t optimize across many compute engines» Harder to manage if too many engines!
CS 245 9
Change the Data Model
Key-value stores: data is just key-value pairs, don’t worry about record internals
Message queues: data is only accessed in a specific FIFO order; limited operations
ML frameworks: data is tensors, models, etc
CS 245 10
Change the Compute Model
Stream processing: Apps run continuously and system can manage upgrades, scaleup, recovery, etc
Eventual consistency: handle it at app level
CS 245 11
Different Hardware Setting
Distributed databases: need to distribute your lock manager, storage manager, etc, or find system designs that eliminate them
Public cloud: “serverless” databases that can scale compute independently of storage (e.g. AWS Aurora, Google BigQuery)
CS 245 12
Example: AWS Aurora Serverless
CS 245 13
Outline
Relational DBMS architecture
Alternative architectures & tradeoffs
Storage hardware
CS 245 15
CPU
DRAM
StorageDevices
Typical Server
CPU
...
I/O Controller
CS 245 16
Network Card
Storage Performance Metrics
CS 245 17
latency (s)
throughput (bytes/s)
storage capacity(bytes, bytes/$)
CPU
CS 245 18
Storage Latency
RegistersL1 CacheL2 Cache
Memory
Disk
12
10
150
Tape /Optical Robot
109
106
This CampusThis Room
My Head
10 min
2 hr
2 Years
1 min
Pluto
2,000 Years
Andromeda
CS 245 19
Sacramento
Max Attainable Throughput
Varies significantly by device» 100 GB/s for RAM» 2 GB/s for NVMe SSD» 130 MB/s for hard disk
Assumes large reads (≫1 block)!
CS 245 20
Storage Cost
$1000 at NewEgg today buys:» 0.25 TB of RAM» 9 TB of NVMe SSD» 50 TB of magnetic disk
CS 245 21
Hardware Trends over Time
Capacity/$ grows exponentially at a fast rate (e.g. double every 2 years)
Throughput grows at a slower rate (e.g. 5% per year), but new interconnects help
Latency does not improve much over time
CS 245 22
Terms: Platter, Head, ActuatorCylinder, TrackSector (physical),Block (logical), Gap
…
Most Common Permanent Storage: Hard Disks
CS 245 24
Top View
CS 245 25
block xin memory
?
I wantblock X
Disk Access Time
CS 245 26
Time = Seek Time +Rotational Delay +Transfer Time +Other
Disk Access Time
CS 245 27
3-5X
X
1 N
Cylinders Traveled
Time
CS 245 28
Seek Time
Typical Seek TimeRanges from» 4 ms for high end drives» 15 ms for mobile devices
In contrast, SSD access time ranges from» 0.02 ms: NVMe» 0.16 ms: SATA
CS 245 29
Head Here
Block I Want
CS 245 30
Rotational Delay
R = 1/2 revolution
HDDSpindle[rpm]
Averagerotational
latency [ms]4,200 7.145,400 5.567,200 4.17
10,000 3.0015,000 2.00
Typical HDD figures
Source: Wikipedia, "Hard disk drive performance characteristics"
R=0 for SSDs
CS 245 31
Average Rotational Delay
Transfer rate T is around 50-130 MB/s
Transfer time: size / T for contiguous read
Block size: usually 512-4096 bytes
CS 245 32
Transfer Rate
So Far: Random Block Access
What about reading the “next” block?
CS 245 33
If We Do Things Right (Double Buffer, etc)
Time to get = block size / t + negligible
Potential slowdowns:» Skip gap» Next track» Discontinuous block placement
CS 245 34
Sequential access generally much faster than random access
…. unless we want to verify!need to add (full) rotation + block size / t
CS 245 35
Cost of Writing: Similar to Reading
To Modify Block:(a) Read Block(b) Modify in Memory(c) Write Block[(d) Verify?]
CS 245 36
Cost To Modify a Block?
Performance of DRAM
The same basic issues with “lookup time” vs throughput apply to DRAM
Min read from DRAM is a cache line (64 bytes)
Even 64-byte random reads may not be as fast as sequential ones due to prefetching, page table, controllers, etc
CS 245 37
Place co-accessed data together!
Example
Suppose we’re accessing 8-byte records in a DRAM with 64-byte cache line sizes
How much slower is random vs sequential?
CS 245 38
In the random case, we are reading 64 bytes for every 8 bytes we need, so we expect to max out the throughput at least 8x sooner.
Storage Hierarchy
Typically want to cache frequently accessed data at a high level of the storage hierarchy to improve performance
CS 245 39
CPU
CPU Cache
DRAM
Disk
(KBs-MBs)
(GBs)
(TBs)
Sizing Storage Tiers
How much high-tier storage should we have?
Can determine based on workload & cost
CS 245 40
The 5 Minute Rule for Trading MemoryAccesses for Disc AccessesJim Gray & Franco PutzoluMay 1985
The Five Minute RuleSay a page is accessed every X seconds
Assume a disk costs D dollars and can do Ioperations/sec; cost of keeping this page on disk is
Cdisk = Ciop / X = D / (I X)
Assume 1 MB of RAM costs M dollars and holds Ppages; then the cost of keeping it in DRAM is:
Cmem = M / P
CS 245 41
Five Minute Rule
This tells us that the page is worth caching when Cmem < Cdisk, i.e.
CS 245 42
X <
Source: The Five-minute Rule Thirty Years Later and its Impact on the Storage Hierarchy
Disk ArraysMany flavors of “RAID”: striping, mirroring, etcto increase performance and reliability
logically one disk
CS 245 43
Common RAID Levels
CS 245 44Image source: Wikipedia
Striping across2 disks: addsperformance butnot reliability
Mirroring across2 disks: addsreliability but notperformance (except for reads)
Striping + 1 parity disk: addsperformance and reliability atlower storage cost
Coping with Disk Failures
Detection» E.g. checksum
Correction» Requires redundancy
CS 245 45
Single Disk» E.g., error-correcting codes on read
Disk Array
Logical Physical
CS 245 46
At What Level Do We Cope?
Logical Block Copy A Copy B
CS 245 47
Operating System
E.g., network-replicated storage
Database System
E.g.,
Log
Current DB Last week’s DB
CS 245 48
SummaryStorage devices offer various tradeoffs in terms of latency, throughput and cost
In all cases, data layout and access pattern matter because random ≪ sequential access
Most systems will combine multiple devices
CS 245 49
Assignment 1
Explores the effect of data layout for a simple in-memory database» Fixed set of supported queries» Implement a row store, column store,
indexed store, and your own custom store!
CS 245 50
Now posted on website!