1 database replication using generalized snapshot isolation sameh elnikety, epfl fernando pedone,...

29
1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

Upload: hubert-pierce

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

1

Database Replication Using Generalized Snapshot Isolation

Sameh Elnikety, EPFLFernando Pedone, USIWilly Zwaenepoel, EPFL

Page 2: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

2

Snapshot Isolation (SI)

• Snapshot = committed state of database

1. On begin:– Snapshot(T) = latest snapshot at start(T)

2. On read or write operation:– T reads from and writes to its snapshot

3. On commit:– Read-only T commits immediately– Update T commits if no conflicting writes

between its start & commit times

Page 3: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

3

Advantages of SI

• Read-only T’s never block or abort

• Read-only T’s never cause update T’s to block or abort

• Compare to 2PL– No read-locks are used in SI

• Important for read-dominated workloads

Page 4: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

4

Drawbacks of SI

• Not serializable– Permits certain anomalies

But

• Anomalies are rare in practice

• Conditions on workload can identify and avoid them

• Developers use SI serializably

Page 5: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

5

Summary of SI

• SI is here to stay

• Used in several databases, e.g.,– Oracle– PostgreSQL– Microsoft SQL Server ( 2PL & SI )– Borland InterBase

Page 6: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

6

SI Replication

• Replicate SI to scale performance for dynamic content Web servers– E.g., E-commerce, bulletin boards

• Workload is suitable for SI– Read-only T’s dominate workload– Update T’s are short & few

• How to maintain SI properties?

Page 7: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

7

SI in Replicated Database

1. On begin:– Snapshot(T) = latest snapshot at start(T)

2. On read or write operation:– T reads from and writes to its snapshot

3. On commit:– Read-only T commits immediately– Update T commits if no conflicting writes

between its start & commit times

Page 8: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

8

1. On begin:– Snapshot(T) = latest snapshot at start(T)

2. On read or write operation:– T reads from and writes to its snapshot

3. On commit:– Read-only T commits immediately– Update T commits if no conflicting writes

between its start & commit times

Strict SI in Replicated Database

Page 9: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

9

Generalized Snapshot Isolation (GSI)

1. On begin:– Snapshot(T) = (latest) older snapshot

• At replica, use latest local snapshot

2. On read or write operation:– T reads from and writes to its snapshot

3. On commit:– Read-only T commits immediately– Update T commits if no conflicting writes

between its (start) snapshot & commit times

Page 10: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

10

Generalized Snapshot Isolation (GSI)

1. On begin:– Snapshot(T) = (latest) older snapshot

• At replica, use latest local snapshot

2. On read or write operation:– T reads from and writes to its snapshot

3. On commit:– Read-only T commits immediately– Update T commits if no conflicting writes

between its (start) snapshot & commit timesCertificationfor update T

Page 11: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

11

Advantages of GSI

• All T’s reads and writes are local

– Important for replicated databases

• Read-only T’s never block or abort

• Read-only T’s never cause update T’s to block or abort– Important for read-dominated workloads

Page 12: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

12

A - GSI Serializability• Not serializable

– Permits certain anomalies as in SI

But

• Anomalies are rare in practice

• Two serializability conditions (in the paper)– Static: examine transaction templates– Dynamic: at run time

• Easy to verify workload is serializable

• Easy to modify workload to be serializable

Page 13: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

13

A - GSI Serializability• Not serializable

– Permits certain anomalies as in SI

But

• Anomalies are rare in practice

• Two serializability conditions (in the paper)– Static: examine transaction templates– Dynamic: at run time

• Easy to verify workload is serializable

• Easy to modify workload to be serializableSimilar to what many Oracle DBA’s already do

Page 14: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

14

• GSI uses older snapshots

But

• Clear definition, always consistent data

• No new anomalies ( same as in SI )

• In replicated database– Transparent: db appears as running SI– Efficient: reads are non-blocking – Staleness: can be bounded

1- On begin:

Snapshot(T) = (latest) older snapshot

B - GSI Older Snapshots

Page 15: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

15

3- On commit: - Read-only T commits immediately

- Update T commits if no conflicting writes between its (start) snapshot &

commit times

C - GSI Abort Rates

• Potentially higher abort rate for updates

But

• Abort rates are small in target workloads

• GSI Abort rates can be higher or lower

Certificationfor update T

Page 16: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

16

GSI in Replicated Databases

• System consists of – Many SI replicas, full replication– Centralized certifier ( distributed in the paper )

• A client connects to one replica– Issues read and update transactions

• Algorithm implements an instance GSI– Snapshot(T) = latest local snapshot at replica

Page 17: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

17

Algorithm at Replica1. On begin:

– Provide T with a local Snapshot– Record T.version = Snapshot.version

2. On read or write operation:– Run transaction (reads/writes) locally– Record T.writeset

3. On commit:– IF ( T is read-only ) THEN { commit }– ELSE {

Invoke certification ( T.version, T.writeset ) . . . }

Page 18: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

18

Algorithm at Certifier

1. Check for conflicting writes from committed T’s with larger version number

2. IF ( yes ) THEN { Reply ( abort ) }

3. ELSE { Advance certifier-version

Record (writeset, certifier-version) to log

Reply ( 1 - commit, 2 - certifier-version,

3 - “missing” writesets ) }

Page 19: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

19

Algorithm at Replica (cont.)1. On begin:

. . .

2. On read or write operation: . . .

3. On commit:– IF ( T is read-only ) THEN { commit }– ELSE {

Invoke certification (T.version, T.writeset )

1- Apply “missing” writesets2- Commit locally3- Advance local version }

Page 20: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

20

Performance Tradeoff GSI : SI

• GSI– better response time

• SI– “fresher” data (latest snapshot in the system)– lower abort rate for updates (?)

• Analytical performance model– Model used by Jim Gray– Replicated database over WAN

Page 21: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

21

Analytical Model• GSI

– Execute T immediately– Updates are certified remotely (communication)

• SI– Block T to obtain latest version (communication)– Updates are certified remotely (communication)

• Objective is to compare GSI : SI– Response time– Abort rate

Page 22: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

22

Analytical Equations• Parameters

x = round trip delay / transaction length

• Response time ratio (GSI : SI)

Read-only update1

2 1

x

x

1

1x

Page 23: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

23

Analytical Equations• Parameters

x = round trip delay / transaction length

t = snapshot age / transaction length

• Response time ratio (GSI : SI)

Read-only update

• Abort rate ratio (GSI : SI)

Read-only (never aborted!) update

1

2 1

x

x

1

1x

2 2

2 2

x t

x

Page 24: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

24

Analytical Results• Parameters

x = round trip delay / transaction length

t = snapshot age / transaction length

• X-axisx = round trip delay / transaction length

x = 0 centralized database

x is increasing as technology advances

• Y-axisResponse time ratio (for reads & updates)

Abort ratio (updates)

Page 25: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

25

Response Time Ratio of GSI : SI

• .

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x = ( round trip delay / transation length )

resp

on

se t

ime

rati

o

GS

I is

bet

ter

1update

2 1

x

x

1

read-only 1x

Page 26: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

26

Abort Ratio of GSI : SI for Updates

• .

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x = ( round trip delay / transation length )

ab

ort

ra

te r

ati

o

t=0t=2t=4t=6t=8t=10t=12t=14t=16S

I bet

ter

GS

I bet

ter

Parameter t = ( snapshot age / transaction length )

2 2

2 2

x t

x

Page 27: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

27

• .

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x = ( round trip delay / transation length )

ab

ort

ra

te r

ati

o

t=0t=2t=4t=6t=8t=10t=12t=14t=16S

I bet

ter

GS

I bet

ter

Abort Ratio of GSI : SI for Updates

Parameter t = ( snapshot age / transaction length )

2 2

2 2

x t

x

t de

crea

sing

fresh

er sn

apsh

ot

Page 28: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

28

GSI : SI - Summary

• GSI response times are better– Read-only T’s ratio : significantly better– Update T’s ratio : reaches ½

• GSI abort rate – maybe higher or lower

• COST: observing older data in GSI

• Favorable trade-off – Distributed environments– Read-dominated workloads

Page 29: 1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

29

Conclusions

• GSI is appealing for replication– All T’s read & write operations are local– Read-only T’s never block or abort

• GSI can be made serializable

• Algorithm for GSI in replicated databases

• Analytical results are encouraging