telex/icecube: a distributed memory with tunable consistency
DESCRIPTION
Telex/IceCube: a distributed memory with tunable consistency. Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguiça Universidade Nova de Lisboa. Telex/IceCube. It is: A distributed shared memory Object-based (high-level operations) - PowerPoint PPT PresentationTRANSCRIPT
Telex/IceCube: a distributed memory with tunable
consistency
Marc Shapiro, Pierre Sutra, Pierpaolo CincillaINRIA & LIP6, Regal group
Nuno PreguicaUniversidade Nova de Lisboa
2
Telex/IceCube
It is:• A distributed shared memory• Object-based (high-level operations)• Transactional (ACID… for some definition of ID)• Persistent
Top level design goal: availability• ∞-optimistic ⇒ (reconcile ⇒ cascading aborts)• Minimize aborts ⇒ non-sequential schedules• (Only) consistent enough to enforce invariants• Partial replication
Distributed TMs — 22-Feb-2012
Scheduling(∞-optimistic execution)
Sequential execution• In (semantic) dependence order• Dynamic checks for preconditions: if OK,
application approves scheduleConflict ⇒ fork
• Latest checkpoint + replay• Independent actions not rolled back
Semantics:• Conflict = non-commuting, antagonistic• Independent
3Distributed TMs — 22-Feb-2012
4Distributed TMs — 22-Feb-2012
Shared CalendarApplication
Telex lifecycle
Application operations
Telex
appt
+action(appt)
User requests > application: actions, dependence
> Telex: add to ACG, transmit
5Distributed TMs — 22-Feb-2012
Telex lifecycle
Schedule
Telex
sched
Shared CalendarApplication
ACG> Telex: valid path = sound schedule
> application: tentative execute> application: if OK, approve
approve
6Distributed TMs — 22-Feb-2012
Telex lifecycle
Conflict ⇒ multiple schedules
Telex
!!!
sched1sched2
Shared CalendarApplication
ACG> Telex: valid path = sound schedule
> application: tentative execute> application: if OK, approve
approve2
7Distributed TMs — 22-Feb-2012
Constraints(Semantics-based conflict
detection)Action: reified operationConstraint: scheduling invariantBinary relations:
• NotAfter • Enables (implication) • NonCommuting
Combinations:• Antagonistic • Atomic • Causal dependence
Action-constraint graph ACG
8Distributed TMs — 22-Feb-2012
Single site scheduling
Sound schedule:• Path in the ACG that satisfies constraints
Fork:• Antagonism
• NonCommuting + Dynamic checks
Several possible schedules
• Penalise lost work• Optimal schedule: NP-hard• IceCube heuristics
9Distributed TMs — 22-Feb-2012
Minimizing aborts(IceCube heuristics)
Iteratively pick a sound scheduleApplication executes, checks, approves
• Check invariants• Reject if violation• Request alternative schedule• Restart from previous
Approved schedule• Preferred for future• Proposed to agreement
11Distributed TMs — 22-Feb-2012
getCstrt
Telex lifecycle
Receive remote operations
TelexACG
+cstrt(antg)
Shared CalendarApplication
User requests > application: actions, dependence> Telex: add to ACG, transmit, merge
12Distributed TMs — 22-Feb-2012
Eventual consistency
Common stable prefix• Diverge beyond
Combine approved schedules• ⇒ Consensus on next extension of
prefix• Equivalence, not equality
0
0
0
13Distributed TMs — 22-Feb-2012
Telex lifecycle
Convergence protocol
Telex agreement
Shared CalendarApplication
approved schedules> Telex: exchange, agree
> commit/abort, serialise
15
Special-case commutative commands ⇒ collision recoveryImprovements higher on WAN.
Generalised Paxos
Fast Paxos
Paxos
FGGC
WAN typical
☺
Distributed TMs — 22-Feb-2012
FGGC
360
16
FGGC: Varying commutativity
Each command reads or writes a randomly-chosen register; WAN
Paxos
FGGC 1 register
FGGC 16384 registers
16384
8192
4096
2048
1024
1
☺
Distributed TMs — 22-Feb-2012
18Distributed TMs — 22-Feb-2012
Example: Sakura Shared Calendarover Telex
Private calendar + common meetingsExample proposals:
• M1: Marc & Lamia & JeanMi, Monday | Tuesday | Friday
• M2: Lamia & Marc & Pierre, Tuesday | Wed. | Friday• M3: JeanMi & Pierre & Marc, Monday | Tues. | Thurs.• Change M3 to Friday
Telex:• (Local) Explore solutions, approve• (Global) Combine, Commit
Marc Lamia Tues. Fri.
M1 M2
Run-timecheck
19Distributed TMs — 22-Feb-2012
Example ACG: calendar application
Schedule: sound cut in the graph
20Distributed TMs — 22-Feb-2012
Status
Open source• 35,000 Java (well-commented) lines of
code• Available at gforge.inria.fr, BSD license
Applications:• Sakura co-operative calendar (INRIA)• Decentralised Collaborative Environment
(UPC)• Database, STMBench7 (UOC)• Co-operative text editor (UNL, INRIA, UOC)• Co-operative ontology editor (U. Aegean)• Co-operative UML editor (LIP6)
21Distributed TMs — 22-Feb-2012
Lessons learned
Separation of concerns:• Application: business logic
– Semantics: constraints, run-time checks• System: Persistence, replication,
consistency– Optimistic: availability, WAN operation– Adapts to application requirements– Constraints + run-time machine
learningModular ⇒ efficiency is an issue
• High latency, availability ⇒ reduce messages
Commutative operations ⇒ CRDTsPartial replication is hard!
22Distributed TMs — 22-Feb-2012
23Distributed TMs — 22-Feb-2012
-----
24Distributed TMs — 22-Feb-2012
Application design
MVC-like execution loop + rollback
Designing for replication is an effort:
• Commute (no constraints)>> Antagonistic
>> Non-Commuting
• Weaken invariants• Make invariants explicit