jiaqing du, daniele sciascia , sameh elnikety willy zwaenepoel , fernando pedone
DESCRIPTION
Clock - RSM : Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks. Jiaqing Du, Daniele Sciascia , Sameh Elnikety Willy Zwaenepoel , Fernando Pedone. EPFL, University of Lugano , Microsoft Research. Replicated State Machines (RSM). - PowerPoint PPT PresentationTRANSCRIPT
Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely
Synchronized Physical Clocks
Jiaqing Du, Daniele Sciascia, Sameh ElniketyWilly Zwaenepoel, Fernando Pedone
EPFL, University of Lugano, Microsoft Research
Replicated State Machines (RSM)
• Strong consistency– Execute same commands in same order– Reach same state from same initial state
• Fault tolerance– Store data at multiple replicas– Failure masking / fast failover
2
Geo-Replication
Data Center
Data Center
Data CenterData Center
Data Center
• High latency among replicas• Messaging dominates replication latency
3
Leader-Based Protocols
• Order commands by a leader replica• Require extra ordering messages at follower
Leader
client request client reply
Ordering
Replication
High latency for geo replication
Ordering
4
Follower
Clock-RSM
• Orders commands using physical clocks• Overlaps ordering and replication
5
client request client reply
Ordering + Replication
Low latency for geo replication
Property and Assumption
• Provides linearizability• Tolerates failure of minority replicas• Assumptions– Asynchronous FIFO channels– Non-Byzantine faults– Loosely synchronized physical clocks
8
Protocol Overview
client request client reply
client request client reply
9
PrepOKcmd1.ts = Clock()
cmd2.ts = Clock()
Clock-RSM
cmd1cmd2
cmd1cmd2
cmd1cmd2
cmd1cmd2
cmd1cmd2
Major Message Steps
• Prep: Ask everyone to log a command• PrepOK: Tell everyone after logging a command
R0
R2
R1
client request
R3
R4
Prep
PrepOK
PrepOK
cmd1.ts = 24
PrepOK
PrepOK
cmd1 committed?
client request
cmd2.ts = 23
10
Commit Conditions
• A command is committed if– Replicated by a majority– All commands ordered before are committed
• Wait until three conditions holdC1: Majority replicationC2: Stable orderC3: Prefix replication
11
C1: Majority Replication
• More than half replicas log cmd1
R0
R2
R1
client request
R3
R4
PrepOK
PrepOK
cmd1.ts = 24
Prep
Replicated by R0, R1, R2
1 RTT: between R0 and majority12
C2: Stable Order
• Replica knows all commands ordered before cmd1– Receives a greater timestamp from every other replica
R0
R2
R1
client request
R3
R4
24
cmd1.ts = 24
2523
25
25
25
0.5 RTT: between R0 and farthest peer
cmd1 is stable at R0
13
Prep / PrepOK / ClockTime
C3: Prefix Replication
• All commands ordered before cmd1 are replicated by a majority
14
R0
R2
R1
client request
R3
R4
cmd1.ts = 24
cmd2 is replicated by R1, R2, R3
cmd2.ts = 23
Prep
PrepOk
1 RTT: R4 to majority + majority to R0
client request
Prep
Prep
PrepOkPrepOk
Overlapping Steps
15
R0
R2
R1
client request
R3
R4
Latency of cmd1 : about 1 RTT to majority
client reply
Majority replication
Stable order
Prefix replication
PrepOK
PrepOK
Prep
Log(cmd1)
Log(cmd1)
24 2523
25
25
25
Prep
Prep
PrepOk
PrepOk
cmd1.ts = 24
Commit LatencyStep Latency
Majority replication 1 RTT (majority1) Stable order 0.5 RTT (farthest) Prefix replication 1 RTT (majority2)
Overall latency = MAX{ 1 RTT (majority1), 0.5 RTT (farthest), 1 RTT (majority2) }
16
If 0.5 RTT (farthest) < 1 RTT (majority), then overall latency ≈ 1 RTT (majority).
R0
Topology Examples
Majority1
Farthest
R0
Majority1
Farthest
R3
R4
R2
R1
R4
R3
R2
R1
17
client request
client request
Paxos 1: Multi-Paxos
• Single leader orders commands– Logical clock: 0, 1, 2, 3, ...
R0
Leader R2
R1
client request
Prep
CommitForward
client reply
PrepOKR3
R4
Latency at followers: 2 RTTs (leader & majority) 19
Paxos 2: Paxos-bcast
• Every replica broadcasts PrepOK– Trades off message complexity for latency
R0
Leader R2
R1
client request
Prep
Forward
client reply
PrepOK
R3
R4
Latency at followers: 1.5 RTTs (leader & majority)20
Clock-RSM vs. Paxos
• With realistic topologies, Clock-RSM has– Lower latency at Paxos follower replicas– Similar / slightly higher latency at Paxos leader
21
Protocol LatencyClock-RSM All replicas: 1 RTT (majority)
if 0.5 RTT (farthest) < 1 RTT (majority)Paxos-bcast Leader: 1 RTT (majority)
Follower: 1.5 RTTs (leader & majority)
Experiment Setup
• Replicated key-value store• Deployed on Amazon EC2
California (CA)
Virginia (VA)
Ireland (IR)
Singapore (SG)
Japan (JP)
23
Overlapping vs. Separate Steps
CA VA
IR
SG
JP
25
CA VA (L)
IR
SG
JP
Clock-RSM latency: max of three
Paxos-bcast latency: sum of three
client request
client request
Also in the Paper
• A reconfiguration protocol• Comparison with Mencius• Latency analysis of protocols
28