edbt2015: transactional replication in hybrid data store architectures
TRANSCRIPT
Transactional Replication in Hybrid Data Store
ArchitecturesHojjat Jafarpour Junichi Tatemura
Hakan Hacigumus
NEC Labs AmericaCupertino, CA
Overview• Goal: Replicate RDBMS with KVS to offload read-only
workloads
• Issue: Concurrent execution over KVS for high-throughput
• Solution: Custom concurrency control that guarantees the execution-defined order in the transaction log.
• Our implementation and experiments
Key-Value Store
RDBMS
Web ApplicationsSQL over
KVS(NEC IERS)
SQL
Read/write workloads
Read-only workloads
Commodity clusterHigh-end server
Background: Solutions for Scalable Data Serving
• Approaches by offloading read-only workloads
Database data
Web Applications
Elastic cluster of commodity servers
Read-only workloads
Read/write workloads
Caching vs. Replication
• Our Choice: Replication – more predictable
offloading
DB DBReplic
a
Cache
Easier/cheaper to introduceCache miss: less predictable
Upfront costMore predictable
Full-Replication vs. Partitioning
• Our Choice: Partitioning – good for serving small
lookups with commodity cluster
M
S
S
S
Each node full capabilityExpensive update to catch up
Our Scalability Solution:Hybrid Data Store Replication• Using KVS to replicate the master RDBMS
• SQL over KVS (NEC InfoFrame Relational Store)
SQL APISQL over
KVS(NEC IERS)Read/
write workloads
Read-only workloads
Key-Value Store
RDBMS
Commodity clusterHigh-end server
Issue: Concurrency Control to Replay Transactions
• Concurrent execution for high-throughput
• Require: CC with execution-defined order guarantee
– It must be serializable to the order in the log
– Generic CC does not work: T2T1 and T1T2 are both
allowed
Key-Value Store
RDBMS
Commodity clusterHigh-end server
T7 T6 T4T5
T2
T1
SQ
L over
KV
ST3
Related Work 1:Deterministic Transaction Ordering
• [Thomson and Abadi, VLDB 2010]
• Preprocessor determines the transaction order
reduce coordination among sites high performance
Preprocessor
Restarting “dependent transactions”
Order can change due to restarting (OK for its purpose)
Our case: the predefined order is requirement.
Related Work 2: Transactional Remote Backup
• [Polyzois and Garcia-Molina, TODS 1994]
• Consistent replay of the master cluster from multiple
logs delivered asynchronously
RW
W
W T1
Log is shipping data applicable to homogeneous systems
Concurrency control
Master Data Center
Backup Data Center
Our case: Log is shipping SQL statements to heterogeneous stores
Why?No direct solution found… Why?
Because… it was NOT a problem!
Unique problem in Hybrid Store Replication
• Parallelizing serialized transactions did not make much sense in traditional settings…
Master RDBMS
T1T2T3T4T5T6T7
SlaveRDBMS
Replaying update
statements will be I/O bound…
Save CPU resource for read-only workloads
(I/O on Local HDD)
Key-Value Store
Unique problem in Hybrid Store Replication
• Parallelism is essential for high-throughput at KVS side
Master RDBMS
T1T2T3T4T5T6T7
One update statement may need multiple remote read/write operations
It can take longer than local execution at the master DB if executed sequentially
The Rest of the Talk
• SQL over KVS
• System architecture
• Concurrency Control
• Experiments
Background: SQL over KVS• Partiqle [SIGMOD 2012 Demo] Product: NEC IERS
1 1 3 10
2 1 1 20
3 3 2 20
4 2 1 10
5 3 3 30
bids(id, item_id, user_id, bid)
Key-Value Store
SELECT * FROM bidsWHERE id = 1
SELECT * FROM bidsWHERE item_id = 1
Key = primary keyValue = tuple
Secondary Indices on KVS
• Index Objects as another set of key-value objects
1 1 3 10
2 1 1 20
3 3 2 20
4 2 1 10
5 3 3 30
bids(id, item_id, user_id, bid)1 [1,2]
2 [4]
3 [3,5]
1 [2,4]
2 [3]
3 [1,5]
Index on bids(item_id)
Index on bids(user_id)
Range Index10
32 X
2 6 10
32
X
1 1 3 8Tuple:
Key-Value Store
2 6 10KEY
Index Node
B-Link Tree Index
SQL over KVS: What Matters to Our Problems
• One SQL statement involves multiple KV read/write operations
• Concurrency control can be done at key-value level
Concurrency Control at Key-Value level
SQL update statements can be seen as Read/Write Key-Value sets after execution
DELETE FROM R1 WHERE b < 100
UPDATE R1 SET b = 60 WHERE b = 120
UPDATE R1 SET b = 80 WHERE id = 1
10
32 X
2 6 10
32
X
R/W conflicts on Index nodes (=Key-value)
System Architecture• QT: Query Translator
• TM: Transaction Manager
TMQT
Update
KVS
T1
T2
T3
T4
T5
T6
PUT
SUBMIT(RW sets)
COMMIT
GET
PUT
States of Transactions• Start Commit Complete
• Logical Timestamps at Start and Complete
TMQT
Update
KVS
START
GET
PUT
SUBMIT
PUT
COMMIT
SUBMIT
COMPLETE
ACTIVE
COMMITTED
COMPLETED
T1: [S1,C1]
Time: S1
Time: C1
Conflict Check• Commit is tested in the sequence
order
• Transactions whose [S,C] overlap must be checked
S6
T1 T3: [S3,C3]T4
COMPLETED
S1 C1
Ci > S
T2:[S2,∞)T5: [S5,∞)COMMITTED
ALL
TMQT
SUBMIT
COMMIT
ENQUEUE
DEQUEUE
PRIORITY QUEUE
T6: [S6,∞)
Experiments
• Implemented System– MySQL (RDBMS) ActiveMQ Voldemort (KVS)
– QT based on Partiqle (NEC IERS)
• Experiments– KVS = Voldemort: 5 nodes
• strong read consistency: N < W + R
• Intel Xeon machines with 2.4 GHz CPU, 16GB Memory
– QT/TM: 20 threads for KV update, 1-20 threads for QT• Intel Core™2 Duo 3.16GHz CPU, 4GB Memory
– Benchmark: TPC-W (7.2GB)
Throughput Gain from Concurrency
• More QT threads throughput increase to some
degree (up to 2.5x)
Number of Conflicts
• More QT threads more conflicts
Impact of Conflicts
• Synthetic workload from TPC-W with
different degrees of conflict
Key-Value Cluster Size• More KV nodes (1) larger KV throughput (2)
shorter completion time (fewer conflicts)
Summary• Concurrent transaction execution on KVS to
replicate master RDBMS
• We developed optimistic concurrency control mechanism that guarantee the execution order in the transaction log
• Implementation and experiments
• Future work: better control of # of concurrency– Adaptive approach from failure frequency
– Predictive/cost-based approach from static analysis