cs 347: distributed databases and transaction processing data replication

54
CS 347 Notes08 1 CS 347: Distributed Databases and Transaction Processing Data Replication Hector Garcia-Molina

Upload: forrest-riddle

Post on 03-Jan-2016

39 views

Category:

Documents


5 download

DESCRIPTION

CS 347: Distributed Databases and Transaction Processing Data Replication. Hector Garcia-Molina. Replication Space. Updates at any copy at fixed (primary) copy at one copy but control can migrate no updates. Replication Space. Correctness no consistency local consistency - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 1

CS 347: Distributed Databases and

Transaction ProcessingData Replication

Hector Garcia-Molina

Page 2: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 2

Replication Space

• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates

Page 3: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 3

Replication Space

• Correctness– no consistency– local consistency– order preserving– serializable schedule– 1-copy serializability

Page 4: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 4

Replication Space

• Expected Failures– processors: fail-stop, byzantine?– network: reliable, partitions, in-order

msgs?– storage: stable disk?

Page 5: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 5

Replication Space

• Implementation Details– update propagation

– physical log records– logical log records– sql updates– transactions

– reads at backup?– architecture

– cross backups– multi-computer copy

– initialization of backup copy

Page 6: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 6

Cross Backups

primary copyDB1

backup copyDB2

primary copyDB2

backup copyDB1

site A site B

Page 7: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 7

Multi-Computer Sites

P1

L1

X1

B1

L1’

Y1

P2

L2

X2

P3

L3

X3

B2

L2’

Y2

B3

L3’

Y3

primarysite

backupsite

Page 8: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 8

1-Safe Backups

– Transactions commit at primary– Redo log records propagated– Transaction commit at backup

P1

L1

X1

B1

L1’

Y1

Page 9: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 9

1-Safe Backups– Transactions can get lost

P1

L1

X1

B1

L1’

Y1

P1

L1

X1

B1

L1’

Y1

T1, T2, T3 T1, T2

T1, T2, T3 T1, T2, T4, T5

Page 10: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 10

2-Safe Backups

– Transactions do 2-phase commit– Redo log records propagated in prepare– Transactions not lost, but

• longer delay, contention• cannot process unless both sites are up

– After failure, go to 1-safe (no backup)

P1

L1

X1

B1

L1’

Y1

Page 11: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 11

What is Correctness?

• In 2-safe• In 1-safe

Page 12: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 12

What is in Paper You Read?

• Specific Senario– updates at fixed primary site– each site has multiple computers– primary-backup sites are matched– clean site failures; stable storage; rel

net– log shipping– no reads at backup– no initialization

Page 13: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 13

Main Problem: Update Dependencies

P1

L1

X1

B1

L1’

Y1

P2

L2

X2

B2

L2’

Y2

primarysite

backupsite

Ta(1)

Ta(2)

Tb

data dependency: TaTb

Page 14: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 14

Main Problem: Update Dependencies

P1

L1

X1

B1

L1’

Y1

P2

L2

X2

B2

L2’

Y2

primarysite

backupsite

Ta(1)

Ta(2)

Tb

data dependency: TaTb

Ta(1) Tb

?

Page 15: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 15

Main Problem: Update Dependencies

P1

L1

X1

B1

L1’

Y1

P2

L2

X2

B2

L2’

Y2

primarysite

backupsite

Ta(1)

Ta(2)

Tb

data dependency: TaTb

Ta(1) Tb

?

• should not install Ta• should not install Tb

Page 16: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 16

Dependency Reconstruction Algorithm

• Locking at backup to detect dependencies

• Ensure locks granted in same order as they were granted at primary

Page 17: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 17

Example: Dependency Reconstruction

P1

L1

X1

B1

L1’

Y1

P2

L2

X2

B2

L2’

Y2

primarysite

backupsite

Ta(1)

Ta(2)

Tb

data dependency: TaTb

5 6

18

tickets reflectlocal commit

order

Page 18: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 18

Example: Dependency Reconstruction

P1

L1

X1

B1

L1’

Y1

P2

L2

X2

B2

L2’

Y2

primarysite

backupsite

Ta(1)

Ta(2)

Tb

data dependency: TaTb

5 6

18

Ta(1) Tb5 6

?

Page 19: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 19

Example: Dependency Reconstruction

P1

L1

X1

B1

L1’

Y1

P2

L2

X2

B2

L2’

Y2

primarysite

backupsite

Ta(1)

Ta(2)

Tb

data dependency: TaTb

5 6

18

Ta(1) Tb5 6

?

• Say Tb requests lock first at B1;• Tb request delayed until all lockswith tickets <6 have been granted

Page 20: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 20

Epoch Algorithm

• Backup updates are installed in batches

• Epoch delimiters written on log

Page 21: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 21

Writing Delimiters at Primary

master

slave

slave

15 16

1615

15 16

log time

Page 22: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 22

Problem with Commits

master

slave

slave

15 16

1615

15 16

log time

prepare commit

T

T’s commit record in Epoch 15 in some logs;in Epoch 16 in others

Page 23: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 23

Solution: Bump Epoch

master

slave

slave

15 16

1615

15 16

log time

prepare commit

T

prepare ack reports epoch number;coordinator bumps epoch if necessary

Page 24: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 24

Installing an Epoch at Backup

master

slave

slave

15

1615

15 16

log time

end of 16 install 1616

end of 16

Page 25: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 25

To Install Epoch X at Backup J

• Redo transactions:– If commit(T) X, commit T– If prepare(T) X but commit(T) > X:

• If T’s primary peer was coordinator, do not commit;

• Else check with the backup of T’s coordinator B’:

– If B’ committing T in epoch X, then we commit T– Else do not commit T

– Otherwise do not commit T (defer to next epoch)

commit(T) X means that T’s commit recordfound in epoch X (or earlier) at node J.

Page 26: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 26

Why Do We Need Coordinator Check?• Assignment: Construct 2 scenarios

that look the same to backup J:– In Scenario 1, T should be installed– In Scenario 2, T should not be

installed

Page 27: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 27

Scenario 1

B’

slave

15 16

15

log time

16P(T)

C(T)P(T)

C(T)

Page 28: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 28

Scenario 2

B’

slave

15 16

15

log time

16P(T)

C(T)P(T)

C(T)

Page 29: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 29

Scenario 3: Possible?

B’

slave

15 16

15

log time

16P(T)

C(T)P(T)

C(T)17

17

Note that T commits at slave but not at B’!!

Page 30: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 30

Scenario 4: Possible?

B’

slave

15 16

15

log time

16

P(T)

C(T)P(T)

C(T)17

17

Note that T commits at B’ but not at slave!!

Page 31: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 31

Comparison of Options

• 2-safe• 1-safe

– dep reconstruction– epoch

• Specific Senario– updates at fixed primary site– each site has multiple computers– primary-backup sites are matched– clean site failures; stable storage; rel

net– log shipping– no reads at backup– no initialization

Page 32: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 32

How to Evaluate

• What system?– actual system(s)– simulation– testbed

• What transactions?– real transactions– synthetic transactions

Page 33: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 33

Metrics

• IO utilization• CPU utilization• Throughput (given max delay?)• Transaction commit delay• Backup copy lag• Network overhead• Probability of inconsistency

Page 34: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 34

Sample Results

Page 35: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 35

Sample Results

Page 36: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 36

And Now For SomethingCompletely Different:

• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates

have seen

next: available copies

Page 37: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 37

PC-lock available copies

• Transactions write lock at all available copies• Transactions read lock at any available copy• Primary site (static) manages

U – set of available copies

X1 X2 X3

*

X4

downprimary

Page 38: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 38

Update Transaction

(1) Get U from primary(2) Get write locks from U nodes(3) Commit at U nodes

C0

PrimaryC1

BackupC2

Backup

Trans T3, U={C0, C1}

U={C0, C1}

Uupdates, 2PC

Page 39: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 39

A potential problem - example

Now: U={C0, C1}

-recovering-

C0

PrimaryC1

BackupC2

Backup

Trans T3, U={C0, C1}

I am recovering

Page 40: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 40

A potential problem - example

Later: U={C0, C1, C2}

-recovering-

C0

PrimaryC1

BackupC2

Backup

Trans T3, U={C0, C1}

You missed T0, T1, T2

T3 updates T3 updates

Page 41: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 41

Solution:

• Initially transaction T gets copy of U’ ofU from primary (or uses cached value)

• At commit of T, check U’ with current Uat primary (if different, abort T)

Page 42: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 42

Solution Continued

• When CX recovers:– request missed and pending transactions

from primary (primary updates U)– set write locks for pending transactions

• Primary polls nodes to detect failures(updates U)

Page 43: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 43

Example Revisited

C0

PrimaryC1

BackupC2

Backup

Trans T3, U={C0, C1}

You missed T0, T1, T2

U={C0, C1, C2}

-recovering-

U={C0, C1}

I am recovering

prepare prepare

reject

Page 44: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 44

Available Copies — No Primary

• Let all nodes have a copy of U (not just primary)

• To modify U, run a special atomic transaction at all available sites(use commit protocol)– E.g.: U1={C1, C2} U2={C1, C2 , C3}

only C1, C2 participate in this transaction– E.g.: U2={C1, C2 , C3} U3={C1, C2}

only C1, C2 participate in this transaction

Page 45: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 45

• Details are tricky...• What if commit of U-change

blocks?

Page 46: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 46

Node Recovery (no primary)• Get missed updates from any active

node• No unique sequence of transactions• If all nodes fail, wait for - all to recover

- majority to recover

Page 47: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 47

recovering node

How much information (update values) must beremembered? By whom?

Committed:A,B,C,D,E,F

Pending: G

Committed:A,C,B,E,D

Pending: F,G,H

Committed:A,B

Example

Page 48: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 48

Correctness with replicated data

S1: r1[X1] r2[X2] w1[X1] w2[X2] Is this schedule serializable?

X1 X2

Page 49: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 49

One copy serializable (1SR)

A schedule S on replicated data is 1SR if it is equivalent to a serial history of the same transactions on a one-copy database

Page 50: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 50

To check 1SR

• Take schedule• Treat ri[Xj] as ri[X] Xj is copy of

X wi[Xj] as wi[X]

• Compute P(S)• If P(S) acyclic, S is 1SR

Page 51: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 51

S1: r1[X1] r2[X2] w1[X1] w2[X2] S1’: r1[X] r2[X] w1[X] w2[X]

S1 is not 1SR!

Example

T1T2

T2T1

Page 52: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 52

Second example

S2: r1[X1] w1[X1] w1[X2]

r2[X1] w2[X1] w2[X2]

S2’: r1[X] w1[X] w1[X]

r2[X] w2[X] w2[X]

P(S2): T1 T2

S2 is 1SR

Page 53: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 53

Second example

S2: r1[X1] w1[X1] w1[X2]

r2[X1] w2[X1] w2[X2]

S2’: r1[X] w1[X] w1[X]

r2[X] w2[X] w2[X]

• Equivalent serial schedule

SS: r1[X] w1[X]

r2[X] w2[X]

Page 54: CS 347:  Distributed Databases and Transaction Processing Data Replication

CS 347 Notes08 54

Summary

• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates

have seen

available copies