cmpt 401 summer 2007 dr. alexandra fedorova lecture xii: replication

45
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

Post on 22-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

CMPT 401 Summer 2007

Dr. Alexandra Fedorova

Lecture XII: Replication

Page 2: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

2CMPT 401 Summer 2007 © A. Fedorova

Replication

Page 3: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

3CMPT 401 Summer 2007 © A. Fedorova

Why Replicate? (I)

Fault-tolerance / High availability As long as one replica is up, the service is available Assume each of n replicas has same independent

probability p to fail. Availability = 1 - pn

Fault-Tolerance: Take-Over

Page 4: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

4CMPT 401 Summer 2007 © A. Fedorova

Why Replicate? (II)

• Fast local access (WAN replication)– client can always send requests

to closest replica– Goal: no communication to

remote replicas necessary during request execution

– Goal: client experiences location transparency since all access is fast local access

Fast local access

Toronto

Montreal

Rome

Page 5: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

5CMPT 401 Summer 2007 © A. Fedorova

Why Replicate?

• Scalability and load distribution (LAN replication)– Requests can be distributed

among replicas– Handle increasing load by adding

new replicas to the system

cluster instead of bigger server

Page 6: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

6CMPT 401 Summer 2007 © A. Fedorova

Challenges: Data Consistency

• We will study systems that use data replication• It is hard, because data must be kept consistent• Users submit operations against the logical copies of data• These operations must be translated into operations

against one, some, or all physical copies of data• Nearly all existing approaches follow a ROWA(A)

approach:– Read-one-write-all-(available)– Update has to be (eventually) executed at all replicas to keep

them consistent– Read can be performed at any replica

Page 7: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

7CMPT 401 Summer 2007 © A. Fedorova

Challenges: Fault Tolerance

• The goal is to have data available despite failures• If one site fails others should continue providing service• How many replicas should we have?• It depends on:

– How many faults we want to tolerate– The types of faults we expect– How much we are willing to pay

Page 8: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

8CMPT 401 Summer 2007 © A. Fedorova

Roadmap• Replication architectures

– Active replication– Primary-backup (passive, master-slave) replication

• Design considerations for replicated services• Surviving failures

Page 9: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

9CMPT 401 Summer 2007 © A. Fedorova

Active Replication

Replicated Servers

A

A

Client

B

C

AA

Page 10: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

10CMPT 401 Summer 2007 © A. Fedorova

Active Replication

Page 11: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

11CMPT 401 Summer 2007 © A. Fedorova

Active Replication

1. The client send request to the servers using totally ordered reliable multicast (logical clocks or vector clocks)

2. Server coordination is given by the total order property (assumption: synchronous system)

3. All replicas execute the request in the order they are delivered

4. No additional coordination necessary (Assumption: determinism) All replicas produce the same result

5. All replicas send result to the client; client waits for the first answer

Page 12: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

12CMPT 401 Summer 2007 © A. Fedorova

Fault Tolerance: Failstop Failures

• As long as at least one replica survivesthe client will continue receiving service

• Assuming there are no partitions!• Suppose B and C are partitioned, so

the cannot communicate• They cannot

agree on howto order client’s requests

Replicated Servers

A

A

Client

B

C

AA

Page 13: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

13CMPT 401 Summer 2007 © A. Fedorova

Fault Tolerance: Byzantine Failures

• Can survive Byzantine failures (assuming no partitions)• The system must have n ≥ 2f + 1 replicas (f is the number

of failures)• The client will compare results of all replicas, will choose

the result returned by the majority f + 1 non-faulty replicas

• This is the idea used in LOCKSS (Lots of Copies Keep Stuff Safe)

Page 14: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

14CMPT 401 Summer 2007 © A. Fedorova

Primary-Backup Replication (PB)

Replicated Servers

A

A

Client

primary

backup

A

B A

backup

C

Also known as passive replication

If the primary fails, a backup takes over,

becomes the primary

Page 15: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

15CMPT 401 Summer 2007 © A. Fedorova

System Requirements

• How do we want the system to behave?• Just like a single-server system?

– Must ensure that there is only one primary at a time• Data is kept consistent:

– If a client received a response from an update operation and then the system crashed, the client should find the data reflecting that update

– Results of operations should be the same as they would be if executed on a single-server system

• Can we tolerate loose data consistency?– The client eventually gets the consistent data, but not right away

Page 16: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

16CMPT 401 Summer 2007 © A. Fedorova

Example of Data Inconsistency

• Client operations:write(x = 5)read (x) // should return 5 on a single-server system

• On a replicated system:write (x = 5)

Primary responds to clientPrimary crashed before propagating

update to other replicasA new primary is selected

read (x) // may return x ≠ 5, the new primary does not know about the update to x

Page 17: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

17CMPT 401 Summer 2007 © A. Fedorova

Design Considerations for Replicated Services

• Where to submit updates?– A designated server or any server?

• When to propagate updates?– Eager or lazy?

• How many replicas to install?

Page 18: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

18CMPT 401 Summer 2007 © A. Fedorova

Where to Submit Updates?

• Primary Copy:- Each object has a primary copy- Often there is a designated primary - it holds primary

copies for all objects - Updates on object x have to be submitted to the primary

copy of x- Primary propagates changes on x to secondary copies- Secondary copies are read-only- Also called master/slave approach

Page 19: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

19CMPT 401 Summer 2007 © A. Fedorova

Where to Submit Updates

• Update Everywhere:– Both read and write operations can be submitted to any server

– This server takes care of the execution of the operation and the

propagation of updates to the other copies

T2:r(y)w(y)T1:r(x)w(y)

Page 20: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

20CMPT 401 Summer 2007 © A. Fedorova

When to Propagate Updates?

• Eager: – Within the boundaries of the transaction for replicated databases– Before response is sent to client for non-transactional services

• Lazy: – After the commit of the transaction for replicated databases– After the response is sent to client for non-transactional services

Page 21: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

21CMPT 401 Summer 2007 © A. Fedorova

PB Replication with Eager Updates

1. The client sends the request to the primary2. There is no initial coordination3. The primary executes the request4. The primary coordinates with the other replicas by

sending the update information to the backups5. The primary (or another replica) sends the answer to the

client

Updates are propagated eagerly, before we respond

to client

Page 22: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

22CMPT 401 Summer 2007 © A. Fedorova

Eager Update Propagation

Page 23: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

23CMPT 401 Summer 2007 © A. Fedorova

Eager Update Propagation For Transactional Services

On every update

At the end of transaction

Page 24: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

24CMPT 401 Summer 2007 © A. Fedorova

When Can a Failure Occur?

• F1: Primary fails before replica coordination– Client receives no response. It will retry. Eventually will get data

from new primary.• F2: Primary fails during replica coordination

– Replicas may or may not have reached agreement w.r.t. client’s transaction. Client may receive a response after system recovers.The system may fail to recover (if the agreement protocol blocks).

• F3: Primary fails after replica coordination – A new primary responds

Phase 1:Client Request

Phase 3:Execution

Phase 4:Replica Coordination

Phase 5:Client response

F1 F2 F3

Page 25: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

25CMPT 401 Summer 2007 © A. Fedorova

Lazy Update Propagation (Transactional Services)

• Primary Copy:– Upon read: read locally and return to user– Upon write: write locally and return to user– Upon commit/abort: terminate locally– Sometime after commit: multicast changed objects in a single

message to other sites (in FIFO)• Secondary copy:

– Upon read: read locally– Upon message from primary copy: install all changes (FIFO)– Upon write from client: refuse (writing clients must submit to

primary copy) – Upon commit/abort request (only for read-only txn): local commit

• Note: existing systems allow different objects to have different primary copies

– A transaction that wants to write X (primary copy is site S1) and Y (primary copy on site S2) is usually disallowed

Page 26: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

26CMPT 401 Summer 2007 © A. Fedorova

Lazy Update Propagation

A client may end up with an inconsistent view of the system

Page 27: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

27CMPT 401 Summer 2007 © A. Fedorova

Lazy Propagation: Discussion

• Lazy replication has no server/agreement coordination within response time– Faster– Transactions might be lost in case of primary crash

• Weak data consistency– Simple to achieve– Secondary copies only need to apply updates in FIFO order– Data at secondary copies might be stale

• Multiple Primaries possible (multi-master replication)– More locality

Page 28: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

28CMPT 401 Summer 2007 © A. Fedorova

How Many Replicas?

• Properties of correct PB protocol– Property 1: There is at most one primary at any time– Property 2: Each client maintains the identity of the primary, and

sends its requests only to the primary– Property 3: If a client update arrives at a backup, it is not

processed• When a primary fails, we must elect a new one• Network partitions may cause election of more than one

primary• We can avoid network by choosing the right number of replicas

(under certain failure assumptions)• How many replicas do we need to tolerate failures?

Page 29: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

29CMPT 401 Summer 2007 © A. Fedorova

System Model

• Synchronous system (useful for deriving theoretical results)• Fully connected network (exactly one FIFO link between any two

processes)• Failure model:

– Crash failures: also known as failstop failures– Crash+Link failures: A server may crash or a link may lose messages (but

links do not delay, duplicate or corrupt messages)– Receive-Omission failures: A server may crash and also omit to receive

some of the messages send over a non-faulty link– Send-Omission failures: A server may fail not only by crashing but also by

omitting to send some messages over a non-faulty link– General-Omission failures: A server may exhibit send-omission and

receive-omission failures

Page 30: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

30CMPT 401 Summer 2007 © A. Fedorova

Lower Bounds on Replication

• How many replicas n do you need to tolerate f failures?

Failure Model Degree of Replicationcrash n > f

crash+link n > f+1

receive-omission n >

send-omission n > f

general-omission n > 2f

23f

Page 31: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

31CMPT 401 Summer 2007 © A. Fedorova

Crash Failures, Send-Omission Failures: n > f Replicas

FAILED(crashed or fail

to send)Becomes primary

Page 32: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

32CMPT 401 Summer 2007 © A. Fedorova

Other Failure Models

• The rest of the failure models may create partitions• Partitions: Servers are divided into mutually non-

communicating partitions• A primary may emerge in each partition, so we’ll have

more than one primary – against the rules• To avoid partitions, we use more replication

Page 33: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

33CMPT 401 Summer 2007 © A. Fedorova

Crash+Link Failures: n > f+1 Replicas

Scenario 1: f servers fail

FAILED

Scenario 2: f links fail

Becomes primary

UNREACHABLE BUT ALIVE

Becomes primary Becomes

primaryProblem! 2 primaries!!!

Page 34: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

34CMPT 401 Summer 2007 © A. Fedorova

Crash+Link Failures: n > f+1 Replicas

Becomes primary

UNREACHABLE BUT ALIVE

Becomes primary

• We need another correct node that would serve as a link between the two partitions

• We can assume that its links will be correct, because we allow no more than f failures

Page 35: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

35CMPT 401 Summer 2007 © A. Fedorova

Omission Failures

• Precise definitions of omission failures[Perry-Toueg86]• Notation:

sent(Pj, Pi) – a message sent from Pj to Pireceived(Pi, Pj) – a message received by Pi from Pj

• Receive-omission failure of Pi with respect to Pj:sent(Pj, Pi) ≠ received(Pi, Pj)

• Send-omission failure of Pi with respect to Pj:Pi fails to send a message prescribed by a protocol to Pj

• General-omission failure of Pi w.r.t. PjPi commits both receive-omission and send-omission w.r.t. Pj

Page 36: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

36CMPT 401 Summer 2007 © A. Fedorova

Receive-Omission Failures: n > 3f/2 Replicas

A B

C

f/2

f/2

Server in A becomes primary

f servers in B and C fail

f/2

FAIL

FAIL

Page 37: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

37CMPT 401 Summer 2007 © A. Fedorova

Receive-Omission Failures: n > 3f/2 Replicas

A B

C

f/2

f/2

Server in B becomes primary

f servers in A and C fail

FAIL

FAIL

f/2

Page 38: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

38CMPT 401 Summer 2007 © A. Fedorova

Receive-Omission Failures: n > 3f/2 Replicas

A B

C

Server in B becomes primary

Servers in A: receive-omission

failures w.r.t. processes outside

their partition

From A servers’ perspective, everyone else has crashed:

partition!

Server in A becomes primary

Servers in B: receive-omission

failures

Problem! 2 primaries!!!

f/2

f/2

f/2

Need another non-failed server that links

the partitions

Page 39: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

39CMPT 401 Summer 2007 © A. Fedorova

General-Omission Failures: n>2f Replicas

BA

BA

FAILBecomes primary

FAILBecomes primary

f f

f f

Page 40: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

40CMPT 401 Summer 2007 © A. Fedorova

General-Omission Failures: n>2f Replicas • A commits general-omission failures w.r.t. servers in B• A’s servers think all servers in B failed – one of them becomes primary• B’s servers think all servers in A failed – one of them becomes primary• A server in A becomes a primary, a server in B becomes a primary:

We have two primaries• To fix this, we need another non-faulty server that will link the two

partitions

BABecomes primary

f f

Becomes primary

Page 41: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

41CMPT 401 Summer 2007 © A. Fedorova

How Many Replicas? Summary

• We showed how many replicas are needed to prevent partitions in the face of f failures

• However partitions do happen due to router failures, for example

• So having extra replicas won’t help, because they will also be on one of the sides of the faulty router

• Next we’ll talk aboutsurviving failures despitenetwork partitions

Page 42: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

42CMPT 401 Summer 2007 © A. Fedorova

Surviving Network Partitions• Most systems operate under assumption that a partition will

eventually be repaired• Optimistic approach:

– Allow updates in all partitions– When the partition is repaired, eventually synchronize the data– OK for a distributed file system (think about your laptop in

disconnected mode)• Pessimistic approach:

– Allow updates only in a single partition – used where strong consistency is required (flight reservation system)

– Which partition? This is usually decided by quorum consensus– After partition is repaired update copies of data in the other partition

Page 43: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

43CMPT 401 Summer 2007 © A. Fedorova

Quorum Consensus • Quorum is a sub-group of servers whose size gives it the right to carry

out the operation• Usually the majority gets the quorum• Design/implementation challenges:

– Replicas must agree that they are behind a partition – must rely on timeouts, failuredetectors (special devices?)

– If the quorum set does not containthe primary, the replicas must elect the new primary

– Cost consideration: to tolerate one partition, musthave at least three servers. Implement one as a simple witness?

Quorum

Page 44: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

44CMPT 401 Summer 2007 © A. Fedorova

Bringing Replicas Up-to-Date• Version numbers:

– Each copy has a version number (or a timestamp)– Only copies that are up-to-date have the current version number– Operations should be applied only to copies with the current version

number• How does a failed server finds out that its not up-to-date?

– Periodically compare all version numbers?• Log sequence numbers:

– Each operation is written to a log (like a transactional log)– Each log record has a log sequence number (LSN)– Replica managers compare LSN’s to find out if they are not up-to-date– Used by Berkeley DB replication system

Page 45: CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

45CMPT 401 Summer 2007 © A. Fedorova

Summary• Discussed replication

– Used for performance, high availability• Active replication

– Client sends updates to all replicas– Replicas co-ordinate amongst themselves, apply updates in order

• Passive replication (primary copy, primary-backup)– Eager/lazy update propagation– Number of replicas to prevent partitions

• Handling partitions– Optimistic– Pessimistic (quorum consensus)

• Next time we will look at real systems that use replication