distributed stms - inesc-idmcouceiro/eurotm/wtm2012/... · 2012. 4. 18. · distributed stms stms...
TRANSCRIPT
Distributed STMs
STMs are being employed in new scenarios: Database caches in three-tier web apps (FénixEDU)
HPC programming language (X10) In-memory cloud data grids (Coherence, Infinispan)
New challenges: Scalability
Fault-tolerance
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland
REPLICATION
1
Partial Replication
Each site stores a partial copy of the data.
Genuine partial replication schemes maximize scalability by ensuring that: Only data sites that replicate data item read or
written by a transaction T, exchange messages for executing/committing T.
Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: considerable overheads in typical workloads
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 2
Issues with Partial Replication
Extending existing local multiversion (MV) STMs is not enough.
Local MV STMs rely on a single global counter to track version advancement.
Problem: Commit of transactions should involve ALL NODES
NO GENUINENESS = POOR SCALABILITY
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 3
GMU: Genuine Multiversion Update-Serializable Replication [ICDCS12]
In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved.
It uses multiple versions for each data item
It builds visible snapshots = freshest consistent snapshots taking into account: 1. causal dependencies vs. previously committed transactions
at the time a transaction began, 2. previous reads executed by the same transaction
Vector clocks used to establish visible snapshots
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland
G M U
4
High Level Overview (i) Transactions commit using a vector clock.
Each node stores a log of committed vector clocks.
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 5
Initial view of the visible snapshot Upon a transaction T begins on N: it acquires the most
recent vector clock in N’s commit log.
View extension of the visible snapshot Upon T reads on a node N:
T’s vector clock can be modified according to N’s commit log.
Three reading rules are applied using T’s vector clock.
High Level Overview (ii)
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 6
Write operation Upon a transaction T writes V on data item O: it inserts
<O,V> in T’s write-set.
Commit operation Read-only transactions always commit.
Update transactions run a genuine 2-Phase Commit: Upon prepare message reception (participant-side)
acquire read/write locks and validate read-set, send back a tentative commit vector clock.
If all replies are positive (coordinator-side)
multicast write-set and final commit vector clock.
Rule 1: Reading Lower Bound Node 0 Node 1
(it stores X) Node 2
(it stores Y)
X(2)
X(2) T1:R(X)
(1,1,1)
(1,2,2)
(1,1,1)
Y(2) (1,2,2)
T0:W(X,v)
T0:W(Y,w)
(1,1,1)
T1:R(Y) Y(2)
(1,2,2)
Most recent VC in VCLog
T1.VC
T0:Commit
Commit
(1,2,2) T1.VC Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 7
Rule 2: Reading Upper Bound Node 0 Node 1
(it stores X) Node 2
(it stores Y)
X(3)
Y(2)
X(1) T1:R(X)
(1,1,1)
(1,3,3)
(1,1,1)
Y(3) (1,3,3)
T0:W(X,v) T0:W(Y,w)
X(1) (1,1,1)
T1:R(Y) Y(2)
T1:Commit
(1,1,1)
Most recent VC in VCLog
T1.VC T0:Commit
Commit
(1,1,2) T1.VC
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland
(1,1,2)
Y(1)
8
Rule 3: Selection of Data Versions
Informally: observe the most recent consistent version of data item id on node i based on T’s history (previous
reads).
Formally: iterate over the versions of id and return the most recent one s.t.
id.version.VN <= T.VC[i]
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 9
Building the commit Vector Clock
Based on a variant of the Skeen’s total order multicast algorithm [SKEEN85].
Intuition: Serialize all-and-only conflicting transactions,
tracking direct and transitive conflict dependencies,
causal relationship
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 10
Consistency Criterion
GMU ensures Extended Update Serializability: Update Serializability [ICDT86] ensures:
1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions;
1CS on the history restricted to committed update transactions and any single read-only transaction. But it can admit non-1CS histories containing at least 2 read-
only transactions.
Extended Update Serializability [Adya99]: ensures US property also to executing transactions;
analogous to opacity in STMs.
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 11
Experiments on private cluster
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
2 4 6 8 10 12 14 16 18 20
Thro
ughp
ut (c
omm
itted
tx/s
ec)
Number of Nodes
Read & Write Transactions - (TPC-C)
GMURR
NGM
8 core physical nodes
TPC-‐C -‐ 90% read-‐only xacts -‐ 10% update xacts -‐ 4 threads per node -‐ moderate conten1on (15% abort rate at 20 nodes)
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 12
Thanks for the attention
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland 13
References
Euro-‐TM Workshop on Transac1onal Memory (WTM 2012), Bern, Switzerland
[Adya99] A. Adya, “Weak consistency: A generalized theory and op1mis1c implementa1ons for distributed transac1ons,” tech. rep., PhD Thesis, Massachusebs Ins1tute of Technology, 1999. [ICDCS12] Sebas1ano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, Luís Rodrigues. “When Scalability Meets Consistency: Genuine Mul1version Update-‐Serializable Par1al Replica1on”. The IEEE 32nd Interna1onal Conference on Distributed Compu1ng Systems, June, 2012. [ICDT86] R. C. Hansdah and L. M. Patnaik, “Update serializability in locking,”. Interna1onal Conference of Database Theory, vol. 243 of Lecture Notes in Computer Science, pp. 171–185, Springer Berlin / Heidelberg, 1986. [SKEEN85] D. Skeen. “Unpublished communica1on”, 1985. Referenced in K. Birman, T. Joseph “Reliable Communica1on in the Presence of Failures”, ACM Trans. on Computer Systems, 47-‐76, 1987 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-‐Store: Genuine Par1al Replica1on in Wide Area Networks”. Proc. of the 29th Symposium of Reliable Distributed Systems, 2010.
14