cs 347: distributed databases and transaction processing data replication
DESCRIPTION
CS 347: Distributed Databases and Transaction Processing Data Replication. Hector Garcia-Molina. Replication Space. Updates at any copy at fixed (primary) copy at one copy but control can migrate no updates. Replication Space. Correctness no consistency local consistency - PowerPoint PPT PresentationTRANSCRIPT
CS 347 Notes08 1
CS 347: Distributed Databases and
Transaction ProcessingData Replication
Hector Garcia-Molina
CS 347 Notes08 2
Replication Space
• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates
CS 347 Notes08 3
Replication Space
• Correctness– no consistency– local consistency– order preserving– serializable schedule– 1-copy serializability
CS 347 Notes08 4
Replication Space
• Expected Failures– processors: fail-stop, byzantine?– network: reliable, partitions, in-order
msgs?– storage: stable disk?
CS 347 Notes08 5
Replication Space
• Implementation Details– update propagation
– physical log records– logical log records– sql updates– transactions
– reads at backup?– architecture
– cross backups– multi-computer copy
– initialization of backup copy
CS 347 Notes08 6
Cross Backups
primary copyDB1
backup copyDB2
primary copyDB2
backup copyDB1
site A site B
CS 347 Notes08 7
Multi-Computer Sites
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
P3
L3
X3
B2
L2’
Y2
B3
L3’
Y3
primarysite
backupsite
CS 347 Notes08 8
1-Safe Backups
– Transactions commit at primary– Redo log records propagated– Transaction commit at backup
P1
L1
X1
B1
L1’
Y1
CS 347 Notes08 9
1-Safe Backups– Transactions can get lost
P1
L1
X1
B1
L1’
Y1
P1
L1
X1
B1
L1’
Y1
T1, T2, T3 T1, T2
T1, T2, T3 T1, T2, T4, T5
CS 347 Notes08 10
2-Safe Backups
– Transactions do 2-phase commit– Redo log records propagated in prepare– Transactions not lost, but
• longer delay, contention• cannot process unless both sites are up
– After failure, go to 1-safe (no backup)
P1
L1
X1
B1
L1’
Y1
CS 347 Notes08 11
What is Correctness?
• In 2-safe• In 1-safe
CS 347 Notes08 12
What is in Paper You Read?
• Specific Senario– updates at fixed primary site– each site has multiple computers– primary-backup sites are matched– clean site failures; stable storage; rel
net– log shipping– no reads at backup– no initialization
CS 347 Notes08 13
Main Problem: Update Dependencies
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
CS 347 Notes08 14
Main Problem: Update Dependencies
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
Ta(1) Tb
?
CS 347 Notes08 15
Main Problem: Update Dependencies
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
Ta(1) Tb
?
• should not install Ta• should not install Tb
CS 347 Notes08 16
Dependency Reconstruction Algorithm
• Locking at backup to detect dependencies
• Ensure locks granted in same order as they were granted at primary
CS 347 Notes08 17
Example: Dependency Reconstruction
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
5 6
18
tickets reflectlocal commit
order
CS 347 Notes08 18
Example: Dependency Reconstruction
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
5 6
18
Ta(1) Tb5 6
?
CS 347 Notes08 19
Example: Dependency Reconstruction
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
5 6
18
Ta(1) Tb5 6
?
• Say Tb requests lock first at B1;• Tb request delayed until all lockswith tickets <6 have been granted
CS 347 Notes08 20
Epoch Algorithm
• Backup updates are installed in batches
• Epoch delimiters written on log
CS 347 Notes08 21
Writing Delimiters at Primary
master
slave
slave
15 16
1615
15 16
log time
CS 347 Notes08 22
Problem with Commits
master
slave
slave
15 16
1615
15 16
log time
prepare commit
T
T’s commit record in Epoch 15 in some logs;in Epoch 16 in others
CS 347 Notes08 23
Solution: Bump Epoch
master
slave
slave
15 16
1615
15 16
log time
prepare commit
T
prepare ack reports epoch number;coordinator bumps epoch if necessary
CS 347 Notes08 24
Installing an Epoch at Backup
master
slave
slave
15
1615
15 16
log time
end of 16 install 1616
end of 16
CS 347 Notes08 25
To Install Epoch X at Backup J
• Redo transactions:– If commit(T) X, commit T– If prepare(T) X but commit(T) > X:
• If T’s primary peer was coordinator, do not commit;
• Else check with the backup of T’s coordinator B’:
– If B’ committing T in epoch X, then we commit T– Else do not commit T
– Otherwise do not commit T (defer to next epoch)
commit(T) X means that T’s commit recordfound in epoch X (or earlier) at node J.
CS 347 Notes08 26
Why Do We Need Coordinator Check?• Assignment: Construct 2 scenarios
that look the same to backup J:– In Scenario 1, T should be installed– In Scenario 2, T should not be
installed
CS 347 Notes08 27
Scenario 1
B’
slave
15 16
15
log time
16P(T)
C(T)P(T)
C(T)
CS 347 Notes08 28
Scenario 2
B’
slave
15 16
15
log time
16P(T)
C(T)P(T)
C(T)
CS 347 Notes08 29
Scenario 3: Possible?
B’
slave
15 16
15
log time
16P(T)
C(T)P(T)
C(T)17
17
Note that T commits at slave but not at B’!!
CS 347 Notes08 30
Scenario 4: Possible?
B’
slave
15 16
15
log time
16
P(T)
C(T)P(T)
C(T)17
17
Note that T commits at B’ but not at slave!!
CS 347 Notes08 31
Comparison of Options
• 2-safe• 1-safe
– dep reconstruction– epoch
• Specific Senario– updates at fixed primary site– each site has multiple computers– primary-backup sites are matched– clean site failures; stable storage; rel
net– log shipping– no reads at backup– no initialization
CS 347 Notes08 32
How to Evaluate
• What system?– actual system(s)– simulation– testbed
• What transactions?– real transactions– synthetic transactions
CS 347 Notes08 33
Metrics
• IO utilization• CPU utilization• Throughput (given max delay?)• Transaction commit delay• Backup copy lag• Network overhead• Probability of inconsistency
CS 347 Notes08 34
Sample Results
CS 347 Notes08 35
Sample Results
CS 347 Notes08 36
And Now For SomethingCompletely Different:
• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates
have seen
next: available copies
CS 347 Notes08 37
PC-lock available copies
• Transactions write lock at all available copies• Transactions read lock at any available copy• Primary site (static) manages
U – set of available copies
X1 X2 X3
*
X4
downprimary
CS 347 Notes08 38
Update Transaction
(1) Get U from primary(2) Get write locks from U nodes(3) Commit at U nodes
C0
PrimaryC1
BackupC2
Backup
Trans T3, U={C0, C1}
U={C0, C1}
Uupdates, 2PC
CS 347 Notes08 39
A potential problem - example
Now: U={C0, C1}
-recovering-
C0
PrimaryC1
BackupC2
Backup
Trans T3, U={C0, C1}
I am recovering
CS 347 Notes08 40
A potential problem - example
Later: U={C0, C1, C2}
-recovering-
C0
PrimaryC1
BackupC2
Backup
Trans T3, U={C0, C1}
You missed T0, T1, T2
T3 updates T3 updates
CS 347 Notes08 41
Solution:
• Initially transaction T gets copy of U’ ofU from primary (or uses cached value)
• At commit of T, check U’ with current Uat primary (if different, abort T)
CS 347 Notes08 42
Solution Continued
• When CX recovers:– request missed and pending transactions
from primary (primary updates U)– set write locks for pending transactions
• Primary polls nodes to detect failures(updates U)
CS 347 Notes08 43
Example Revisited
C0
PrimaryC1
BackupC2
Backup
Trans T3, U={C0, C1}
You missed T0, T1, T2
U={C0, C1, C2}
-recovering-
U={C0, C1}
I am recovering
prepare prepare
reject
CS 347 Notes08 44
Available Copies — No Primary
• Let all nodes have a copy of U (not just primary)
• To modify U, run a special atomic transaction at all available sites(use commit protocol)– E.g.: U1={C1, C2} U2={C1, C2 , C3}
only C1, C2 participate in this transaction– E.g.: U2={C1, C2 , C3} U3={C1, C2}
only C1, C2 participate in this transaction
CS 347 Notes08 45
• Details are tricky...• What if commit of U-change
blocks?
CS 347 Notes08 46
Node Recovery (no primary)• Get missed updates from any active
node• No unique sequence of transactions• If all nodes fail, wait for - all to recover
- majority to recover
CS 347 Notes08 47
recovering node
How much information (update values) must beremembered? By whom?
Committed:A,B,C,D,E,F
Pending: G
Committed:A,C,B,E,D
Pending: F,G,H
Committed:A,B
Example
CS 347 Notes08 48
Correctness with replicated data
S1: r1[X1] r2[X2] w1[X1] w2[X2] Is this schedule serializable?
X1 X2
CS 347 Notes08 49
One copy serializable (1SR)
A schedule S on replicated data is 1SR if it is equivalent to a serial history of the same transactions on a one-copy database
CS 347 Notes08 50
To check 1SR
• Take schedule• Treat ri[Xj] as ri[X] Xj is copy of
X wi[Xj] as wi[X]
• Compute P(S)• If P(S) acyclic, S is 1SR
CS 347 Notes08 51
S1: r1[X1] r2[X2] w1[X1] w2[X2] S1’: r1[X] r2[X] w1[X] w2[X]
S1 is not 1SR!
Example
T1T2
T2T1
CS 347 Notes08 52
Second example
S2: r1[X1] w1[X1] w1[X2]
r2[X1] w2[X1] w2[X2]
S2’: r1[X] w1[X] w1[X]
r2[X] w2[X] w2[X]
P(S2): T1 T2
S2 is 1SR
CS 347 Notes08 53
Second example
S2: r1[X1] w1[X1] w1[X2]
r2[X1] w2[X1] w2[X2]
S2’: r1[X] w1[X] w1[X]
r2[X] w2[X] w2[X]
• Equivalent serial schedule
SS: r1[X] w1[X]
r2[X] w2[X]
CS 347 Notes08 54
Summary
• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates
have seen
available copies