the end of an architectural era: (it’s time for a complete rewrite)
DESCRIPTION
The end of an architectural era: (it’s time for a complete rewrite). M. Stonebraker , S. Madden, D. J. Abadi , S. Harizopoulos , N. Hachem , and P. Helland VLDB, 2007. Presented by: Suprio Ray. The I/O Gap. Disk capacity doubles every 18 months. The I/O Gap. - PowerPoint PPT PresentationTRANSCRIPT
The end of an architectural era: (it’s time for a complete rewrite)
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. HellandVLDB, 2007
Presented by: Suprio Ray
The I/O Gap
• Disk capacity doubles every 18 months
The I/O Gap
• Disk capacity doubles every 18 months• Memory size doubles every 18 months• Disk bandwidth doubles every 10 years
(R. Feritas et. al.
FAST, 2008)
• Memory (latency) is ~6000 times faster than disk
The I/O Gap
• Disk capacity doubles every 18 months• Memory size doubles every 18 months• Disk bandwidth doubles every 10 years
(R. Feritas et.
al. FAST, 2008)
• Avoid accessing disk (if possible)
One size does not fit all
• OLTP – Amazon : 42 TB– Typical: less than a TB
• Data Warehouse– Yahoo : 2 PB– Ebay: 1.4 PB
• Search engines (text)– Google : 850 TB
• Scientific – US Department of Energy (NERSC): 3.5 PB
• Stream processing
One size does not fit all
• OLTP – Amazon : 42 TB– Typical: less than a TB
• Data Warehouse– Yahoo : 2 PB– Ebay: 1.4 PB
• Search engines (text)– Google : 850 TB
• Scientific – US Department of Energy (NERSC): 3.5 PB
• Stream processing
Goal: Build a custom, high performance
OLTP database
Overview
• Motivation • OLTP overheads • System architecture• Transaction management• Evaluation• Conclusion and discussion
Data + Indexes
Database System Architecture
Query Processing Transaction ManagementSQL query
Parser
QueryRewriter
andOptimizer
ExecutionEngine
relational algebra
Statistics & Catalogs &System Data
query executionplan
BufferManager
TransactionManager
Calls from Transactions (read,write)
ConcurrencyController
LockTable
RecoveryManager
Log
OLTP Overheads
• Logging - Must be written to disk for durability
• Locking- To read or write a record
• Latching- Updates to shared data structure
• Buffer management- Cache disk pages in memory
Design considerations to remove overheads
Optimization Advantage
Memory resident database
Remove buffer mgt
Partitioning and replication
High-availability, Remove logging
Single-threaded execution
Remove locking and latching
Transaction variants Remove concurrency control
H-Store system architecture
• Shared-nothing, main-memory, row-store relational database
• Node– hosts 1 or more sites
• Site– single threaded – one site per core
• Relation – divided into one or more partitions
or– cloned
• Partition– replicated and hosted on multiple sites
Runtime model
• Stored procedure interface for transaction– Unique name– Control and SQL commands
• SQL command execution– annotate the exec plan– passed to Transaction mgr– plans are transmitted– results passed back to initiator
System deployment
• Cluster deployment framework (CDF) accepts– a set of stored procedure– database schema– sample workload– available sites
• CDF produces– a set of compiled
stored procedure– physical DB layout
Transaction variants
• Single-sited- All queries can be executed on just one node
• One-shot- Individual queries can be executed on single nodes
• Two-phase- Phase 2 can be executed without integrity violation
• Strongly two-phase- Either all replicas continue or all abort
• Sterile- Order of execution doesn’t matter
Transaction management
• Replica synchronization– Read any replica; update all replicas
• Transaction ordering– Each transaction is timestamped
• Concurrency control considerations– OLTP transactions are very short-lived– Single threaded execution avoids page
latching– Not needed for some transaction classes
(single-sited/one shot/sterile)
site_id local_unique_timestamp
Concurrency control strategy
• Basic strategy– Wait for a small time for conflicting transactions
with lower timestamp– If none found, execute the subplan and send result– Else, issue an abort
• Intermediate strategy– Wait for a length of time approximated by
MaxD * average_round_trip_message_delay
• Advanced strategy– If needed, abort a transaction using Optimistic CC
rules
Evaluation – experimental setup
• Benchmark: a variant of TPC-C – all transaction classes made one-shot and
strongly two-phased– all transaction classes implemented as stored
procedures
• Databases– H-Store– a popular commercial RDBMS, X
• Hardware– Dual-core 2.8GHz system – 4GB RAM – 4 x 250 GB SATA disk drives
Evaluation – results
• Metric: Transactions/second per core• H-Store 82 times faster than X
* performance record
published by TPC-C
H-Store limitations
• The database must fit into the available memory
• A cluster-wide power failure to cause the loss of committed transactions
• A limited subset of SQL '99 is supported– DDL operations like ALTER and DROP aren't supported
• Challenging operations model– Changing the schema or reconfiguring hardware requires first
saving and shutting down the system
• No WAN support (single data-center)– In case of a network partition, some queries will not execute
Conclusion
• Demise of general purpose database (prediction)
• H-Store is a custom, main-memory database optimized for OLTP
• H-Store shows significant performance advantage over a popular relational database
Discussion• Raw speed vs. ease of use
– Limited DDL support, changing schema/node requires reboot
• “Separation of concern”– Is it a good idea to embed appl. logic in stored procedure?
• Custom vs. general purpose query language– SQL to be replaced with Ruby-on-Rails ?
• No WAN support: single data-center assumption– CAP theorem
• Catastrophic failure scenario