edbt2015: transactional replication in hybrid data store architectures

Transactional Replication in Hybrid Data Store

ArchitecturesHojjat Jafarpour Junichi Tatemura

Hakan Hacigumus

NEC Labs AmericaCupertino, CA

Overview• Goal: Replicate RDBMS with KVS to offload read-only

workloads

• Issue: Concurrent execution over KVS for high-throughput

• Solution: Custom concurrency control that guarantees the execution-defined order in the transaction log.

• Our implementation and experiments

Key-Value Store

RDBMS

Web ApplicationsSQL over

KVS(NEC IERS)

SQL

Read/write workloads

Read-only workloads

Commodity clusterHigh-end server

Background: Solutions for Scalable Data Serving

• Approaches by offloading read-only workloads

Database data

Web Applications

Elastic cluster of commodity servers

Read-only workloads

Read/write workloads

Caching vs. Replication

• Our Choice: Replication – more predictable

offloading

DB DBReplic

a

Cache

Easier/cheaper to introduceCache miss: less predictable

Upfront costMore predictable

Full-Replication vs. Partitioning

• Our Choice: Partitioning – good for serving small

lookups with commodity cluster

M

S

S

S

Each node full capabilityExpensive update to catch up

Our Scalability Solution:Hybrid Data Store Replication• Using KVS to replicate the master RDBMS

• SQL over KVS (NEC InfoFrame Relational Store)

SQL APISQL over

KVS(NEC IERS)Read/

write workloads

Read-only workloads

Key-Value Store

RDBMS


Issue: Concurrency Control to Replay Transactions

• Concurrent execution for high-throughput

• Require: CC with execution-defined order guarantee

– It must be serializable to the order in the log

– Generic CC does not work: T2T1 and T1T2 are both

allowed

Key-Value Store

RDBMS


T7 T6 T4T5

T2

T1

SQ

L over

KV

ST3

Related Work 1:Deterministic Transaction Ordering

• [Thomson and Abadi, VLDB 2010]

• Preprocessor determines the transaction order

reduce coordination among sites high performance

Preprocessor

Restarting “dependent transactions”

Order can change due to restarting (OK for its purpose)

Our case: the predefined order is requirement.

Related Work 2: Transactional Remote Backup

• [Polyzois and Garcia-Molina, TODS 1994]

• Consistent replay of the master cluster from multiple

logs delivered asynchronously

RW

W

W T1

Log is shipping data applicable to homogeneous systems

Concurrency control

Master Data Center

Backup Data Center

Our case: Log is shipping SQL statements to heterogeneous stores

Why?No direct solution found… Why?

Because… it was NOT a problem!

Unique problem in Hybrid Store Replication

• Parallelizing serialized transactions did not make much sense in traditional settings…

Master RDBMS

T1T2T3T4T5T6T7

SlaveRDBMS

Replaying update

statements will be I/O bound…

Save CPU resource for read-only workloads

(I/O on Local HDD)

Key-Value Store

Unique problem in Hybrid Store Replication

• Parallelism is essential for high-throughput at KVS side

Master RDBMS

T1T2T3T4T5T6T7

One update statement may need multiple remote read/write operations

It can take longer than local execution at the master DB if executed sequentially

The Rest of the Talk

• SQL over KVS

• System architecture

• Concurrency Control

• Experiments

Background: SQL over KVS• Partiqle [SIGMOD 2012 Demo] Product: NEC IERS

1 1 3 10

2 1 1 20

3 3 2 20

4 2 1 10

5 3 3 30

bids(id, item_id, user_id, bid)

Key-Value Store

SELECT * FROM bidsWHERE id = 1

SELECT * FROM bidsWHERE item_id = 1

Key = primary keyValue = tuple

Secondary Indices on KVS

• Index Objects as another set of key-value objects

1 1 3 10

2 1 1 20

3 3 2 20

4 2 1 10

5 3 3 30

bids(id, item_id, user_id, bid)1 [1,2]

2 [4]

3 [3,5]

1 [2,4]

2 [3]

3 [1,5]

Index on bids(item_id)

Index on bids(user_id)

Range Index10

32 X

2 6 10

32

X

1 1 3 8Tuple:

Key-Value Store

2 6 10KEY

Index Node

B-Link Tree Index

SQL over KVS: What Matters to Our Problems

• One SQL statement involves multiple KV read/write operations

• Concurrency control can be done at key-value level

Concurrency Control at Key-Value level

SQL update statements can be seen as Read/Write Key-Value sets after execution

DELETE FROM R1 WHERE b < 100

UPDATE R1 SET b = 60 WHERE b = 120

UPDATE R1 SET b = 80 WHERE id = 1

10

32 X

2 6 10

32

X

R/W conflicts on Index nodes (=Key-value)

System Architecture• QT: Query Translator

• TM: Transaction Manager

TMQT

Update

KVS

T1

T2

T3

T4

T5

T6

PUT

SUBMIT(RW sets)

COMMIT

GET

PUT

States of Transactions• Start Commit Complete

• Logical Timestamps at Start and Complete

TMQT

Update

KVS

START

GET

PUT

SUBMIT

PUT

COMMIT

SUBMIT

COMPLETE

ACTIVE

COMMITTED

COMPLETED

T1: [S1,C1]

Time: S1

Time: C1

Conflict Check• Commit is tested in the sequence

order

• Transactions whose [S,C] overlap must be checked

S6

T1 T3: [S3,C3]T4

COMPLETED

S1 C1

Ci > S

T2:[S2,∞)T5: [S5,∞)COMMITTED

ALL

TMQT

SUBMIT

COMMIT

ENQUEUE

DEQUEUE

PRIORITY QUEUE

T6: [S6,∞)

Experiments

• Implemented System– MySQL (RDBMS) ActiveMQ Voldemort (KVS)

– QT based on Partiqle (NEC IERS)

• Experiments– KVS = Voldemort: 5 nodes

• strong read consistency: N < W + R

• Intel Xeon machines with 2.4 GHz CPU, 16GB Memory

– QT/TM: 20 threads for KV update, 1-20 threads for QT• Intel Core™2 Duo 3.16GHz CPU, 4GB Memory

– Benchmark: TPC-W (7.2GB)

Throughput Gain from Concurrency

• More QT threads throughput increase to some

degree (up to 2.5x)

Number of Conflicts

• More QT threads more conflicts

Impact of Conflicts

• Synthetic workload from TPC-W with

different degrees of conflict

Key-Value Cluster Size• More KV nodes (1) larger KV throughput (2)

shorter completion time (fewer conflicts)

Summary• Concurrent transaction execution on KVS to

replicate master RDBMS

• We developed optimistic concurrency control mechanism that guarantee the execution order in the transaction log

• Implementation and experiments

• Future work: better control of # of concurrency– Adaptive approach from failure frequency

– Predictive/cost-based approach from static analysis

edbt2015: transactional replication in hybrid data store architectures

Science

master rdbms sql

workloads readwrite

hybrid data store replication

workloads io

workloads issue

sql statements

master cluster

replicate rdbms