acid transactions in apache phoenix with apache tephra™ (incubating), by poorna chandra

16
Acid Transactions + Poorna Chandra

Upload: cask-data-inc

Post on 07-Jan-2017

910 views

Category:

Technology


6 download

TRANSCRIPT

Acid Transactions

+

Poorna Chandra

Why Transactions?• All or none semantics simplifies life of developer – Ensures every client has a consistent view of data – Protects against concurrent updates – No need to reason about what state data is left in if write fails – Guaranteed consistency between data and index

2

Apache Tephra• Transactions on Apache HBase – Across regions, tables and RPC calls

• ACID semantics • Tephra Powers– CDAP (Cask Data Application Platform) – Apache Phoenix (4.7 onwards)

• Apache Incubator Project

3

Architecture

4

ZooKeeper

Tx Manager(standby)

HBase

Master 1

Master 2

RS 1

RS 2 RS 4

RS 3

Client 1

Client 2

Client N

Tx Manager(active)

Tephra Components• TransactionAware client – Coordinates transaction lifecycle with manager– Communicates directly with HBase for reads and writes

• Transaction Manager – Assigns transaction IDs– Maintains state on in-progress, committed and invalid transactions – Detects conflicts

• Transaction Processor coprocessor– Applies server-side filtering for reads – Cleans up data from failed transactions, and no longer visible versions

5

Transaction Lifecycle

6

timeout

try abort

failedroll back

data ops do work

Client Tx Manager

none

complete Vabortsucceeded

in progress

start tx start start tx

committry commit check conflicts

RPC API

invalid Xinvalidatefailed

Snapshot Isolation• Multi-version concurrency control – Cell version (timestamp) = transaction ID– All writes in the same transaction use the transaction ID as timestamp– Reads exclude other uncommitted transactions (for isolation)– Own uncommitted writes are visible

• Optimistic Concurrency Control– Transactions proceed concurrently – Conflict detection at commit of transaction– Rollback of committing transaction in case of conflict

7

Optimistic Concurrency Control• Avoids cost of locking rows and tables• No deadlocks or lock escalations• Cost of conflict detection and possible rollback is higher• Good if conflicts are rare: short transaction, disjoint

partitioning of work

8

Transactions in Phoenix• Client hbase-site.xml

<property> <name>phoenix.transactions.enabled</name> <value>true</value></property>

9

Transactions in Phoenix• Server hbase-site.xml

<property> <name>data.tx.snapshot.dir</name> <value>/tmp/tephra/snapshots</value></property>

• Set $HBASE_HOME and start the transaction manager./bin/tephra

10

Transactional Table• Enable transactions on a new table

CREATE TABLE my_table (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true;

• Enable transactions on an existing table– Transactional table cannot be converted to non-transactionalALTER TABLE my_other_table SET TRANSACTIONAL=true;

11

Use Transactions• Transaction started implicitly

SELECT * FROM my_table; -- This will start a transactionUPSERT INTO my_table VALUES (1,'A');SELECT count(*) FROM my_table WHERE k=1; -- Will see uncommitted rowDELETE FROM my_other_table WHERE k=2;!commit -- Other transactions will now see your updates

• Exception thrown on conflicts.java.sql.SQLException: ERROR 523 (42900): Transaction aborted due to conflict with other mutations. Conflict detected for transaction 1454112544975000000.

12

Secondary Index• Index creation and updates are transactional

CREATE INDEX my_table (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true;

• Data operations and index updates happen in a single transaction

• Transaction is rolled back on either– Data write fails– Index update fails

13

Performance

14

Performance

15

2 write threads per client, 1000 row batch size, 15 columns table

Future Work• Partitioned Transaction Manager• Automatic pruning of invalid transaction list• Read-only transactions• Performance optimizations– Conflict detection– Appends to transaction edit log

16