lecture xi: distributed transactions

34
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XI: Distributed Transactions

Upload: khalil

Post on 24-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Lecture XI: Distributed Transactions. CMPT 401 Summer 2007 Dr. Alexandra Fedorova. A Distributed Transaction. A transaction is distributed across n processes. Each process can decide to commit or abort the transaction A transaction must commit on all sites or abort on all sites. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture XI: Distributed Transactions

CMPT 401 Summer 2007

Dr. Alexandra Fedorova

Lecture XI: Distributed Transactions

Page 2: Lecture XI: Distributed Transactions

2CMPT 401 Summer 2007 © A. Fedorova

A Distributed Transaction

• A transaction is distributed across n processes.• Each process can decide to commit or abort the

transaction• A transaction must commit on all sites or abort on all sites

Application Program / Coordinating Server

Participating Site

Participating Site

commit

commit

Page 3: Lecture XI: Distributed Transactions

3CMPT 401 Summer 2007 © A. Fedorova

Comparison With Single-Site Transactions

• For single-site transactions we talked about:– Committing a transaction– Recovery (for atomicity, consistency and durability)– Concurrency control (for atomicity, consistency and

isolation)• For distributed transactions we will talk about:

– Distributed commitment– Distributed recovery– Distributed concurrency control

Page 4: Lecture XI: Distributed Transactions

4CMPT 401 Summer 2007 © A. Fedorova

A Distributed Transaction

# Some participant (the invoker) executes:send [T_START: transaction, Dc, participants] to all participants

# All participants (including the invoker) execute:upon (receipt of T_START: transaction, Dc, participants]Cknow = local_clock

# Perform operations requested by transactionif(willing and able to make updates permanent) then

vote = YESelse vote = NO#Decide commit or abort for the transactionatomic_commitment(transaction, participants)

Page 5: Lecture XI: Distributed Transactions

5CMPT 401 Summer 2007 © A. Fedorova

Atomic Commitment Problem

• All processes must decide to commit or abort– Is this possible in an asynchronous system?

• No. This is like consensus, and we saw that consensus is impossible

• But it is still used in asynchronous systems under certain assumptions but with no guarantees about termination:– Communication is reliable– Processes can crash– Processes recover from failure– Processes can log their state in stable storage– Stable storage survives crashes and is accessible upon restart

Page 6: Lecture XI: Distributed Transactions

6CMPT 401 Summer 2007 © A. Fedorova

Properties of Atomic Commitment

• Property 1: All participants that decide reach the same decision

• Property 2: If any participant decides commit, then all participants must have voted YES

• Property 3: If all participants vote YES and no failures occur, then all participants decide commit

• Property 4: Each participant decides at most once (a decision is irreversible)

Page 7: Lecture XI: Distributed Transactions

7CMPT 401 Summer 2007 © A. Fedorova

Components of Atomic Commitment• Normal execution

– The steps executed while no failures occur• Termination protocol

– When a site fails, the correct sites should still be able to decide on the outcome of pending transactions.

– They run a termination protocol to decide on all pending transactions.

• Recovery– When a site fails and then restarts it has to perform recovery for

all transactions that it has not yet committed appropriately– Single site recovery: abort all transactions that were active at the

time of the failure– Distributed system: might have to ask around; maybe an active

transaction was committed in the rest of the system

Page 8: Lecture XI: Distributed Transactions

8CMPT 401 Summer 2007 © A. Fedorova

Blocking vs. Non-Blocking Atomic Commitment

• Blocking Atomic Commitment: correct participants may be prevented from terminating the transaction due to failures of other part of the system

• Non-Blocking Atomic Commitment: transactions terminate consistently at all participating sites even in the presence of failures

• Non-blocking atomic commitment is not possible in systems with unreliable communication

• Non-blocking algorithms exist if one assumes reliable communication and a synchronous system

Page 9: Lecture XI: Distributed Transactions

9CMPT 401 Summer 2007 © A. Fedorova

Two-Phase Commit Protocol (2PC)

• An atomic commitment protocol– Phase 1: Decide commit or abort– Phase 2: Get the final decision from the coordinator, and execute

the final decision• We will study the protocol in a synchronous system

– Assume a message arrives within interval δ– Assume we can compute Dc – local time required to complete the

transaction– Assume we can compute Db – additional delay associated with

broadcast

Page 10: Lecture XI: Distributed Transactions

10CMPT 401 Summer 2007 © A. Fedorova

Two-Phase Commit Protocol (I)

# Executed by coordinatorprocedure atomic_commitment(transaction, participants)

send [VOTE_REQUEST] to all participantsset-timeout-to local_clock + 2δwait-for (receipt of [vote: vote] messages from all participants)

if (all votes are YES) thenbroadcast (commit, participants)

else broadcast (abort, participants)on-timeout

broadcast (abort, participants)

Page 11: Lecture XI: Distributed Transactions

11CMPT 401 Summer 2007 © A. Fedorova

Two-Phase Commit Protocol (II)# Executed by all participants (including the coordinator)

set-timeout-to Cknow + Dc + δwait-for (receipt of [VOTE_REQUEST] from coordinator)

send [vote: vote] to coordinatorif (vote = NO) then

decide abortelse

set-timeout-to Cknow + Dc + 2δ + Dbwait-for (delivery of decision message)

if(decision message is abort) thendecide abort

else decide commit on-timeout

decide according to termination_protocol()on-timeout

decide abort

May block the

participant

Page 12: Lecture XI: Distributed Transactions

12CMPT 401 Summer 2007 © A. Fedorova

The Need for Termination Protocol• If a participant

– Voted YES– Sent its decision to coordinator, but….– Received no final decision from coordinator

• A termination protocol must be run– Participants cannot simply decide to abort– If they already said they would commit, they cannot change their minds– The coordinator might have sent “commit” decisions to some

participants and then crashed– Since those participants might have committed, no other participant can

decide “abort”• A termination protocol will try to contact other participants to find

out what they decided, and try to reach a decision

Page 13: Lecture XI: Distributed Transactions

13CMPT 401 Summer 2007 © A. Fedorova

Blocking Nature of Two-Phase Commit

• All protocols involving a termination protocol are blocking• A termination protocol may lead to a decision, but there

are scenarios when it will not:– The coordinator crashes during the broadcast of a decision– Several participants received the decision from coordinator,

applied it, and then crashed– All correct (not crashed) participants voted “YES”– Correct participants cannot decide until faulty participants

recover– If some faulty participants had decided “NO”, they would have

aborted– If some faulty participants had received a “commit” decision from

the coordinator, they would have committed

Page 14: Lecture XI: Distributed Transactions

14CMPT 401 Summer 2007 © A. Fedorova

Non-Blocking Atomic Commitment

• Property 4: Every correct participant that executes the atomic commitment protocol eventually decides

• Recall that 2PC was blocking because coordinator could broadcast decision to a part of participants and then crash

• Need a broadcast with stronger properties, such as Uniform Timed Reliable Broadcast (UTRB)

• Property of UTRB: If any participant (correct or not) receives a broadcast message m, then all correct participants eventually receive m

• With UTRB, if a participant times out waiting for final decision, it can unilaterally decide to abort

• The participant knows that if it has not received a message, then no one else has, so no one else could have committed

Page 15: Lecture XI: Distributed Transactions

15CMPT 401 Summer 2007 © A. Fedorova

A Simple UTRB Algorithm

procedure broadcast (m, G) // G is the set of participants

# Broadcaster executes:send [m] to all processes in Gdeliver m //local delivery of message m to the

process

# Process p ≠ broadcaster in G executes:upon (first receipt of [m])send [m] to all processes in Gdeliver m

Message will not be delivered to

local process before it is

forwarded to other participants

Page 16: Lecture XI: Distributed Transactions

16CMPT 401 Summer 2007 © A. Fedorova

2PC in Asynchronous Systems

• 2PC can be implemented in an asynchronous system with reliable communication channels

• This means that a message eventually gets delivered…• But we cannot set bounds on delivery time• So the process might have to wait forever…• Therefore, you cannot have non-blocking atomic commitment

in an asynchronous system• What if a participant whose message is waited on has crashed? • The expectation is that the participant will properly recover and

continue the protocol• So now let’s look at distributed recovery

Page 17: Lecture XI: Distributed Transactions

17CMPT 401 Summer 2007 © A. Fedorova

Distributed Recovery

• Remember single-site recovery: – transaction log records are kept on stable storage– upon reboot the system “undoes” updates from active on

uncommitted transactions– “replays” updates from committed transactions

• In a distributed system we cannot be sure whether a transaction that was active at the time of crash is:– Still active– Has committed– Has aborted– Maybe it has executed more updates while the recovering site

was crashing and rebooting

Page 18: Lecture XI: Distributed Transactions

18CMPT 401 Summer 2007 © A. Fedorova

Components of Distributed Recovery

• Actions performed by recovering site:– For each transaction that was active before the crash, try to

decide unilaterally based on log records– Ask others what they have decided

• Actions performed by other participants– Others may send the decision– Or a “don’t know” message

Page 19: Lecture XI: Distributed Transactions

19CMPT 401 Summer 2007 © A. Fedorova

Crash Before Local Decision

• Suppose a site crashes during the execution of transaction, before it reaches local decision (YES or NO)

• The transaction could have completed at other sites• What are the options? • Option 1: The crashed site restores its state with help of

other participants (restore the updates made while it was crashing and recovering)

• Option 2: The crashed site realizes that it crashed (by keeping the crash count), and sets local decision to NO

Page 20: Lecture XI: Distributed Transactions

20CMPT 401 Summer 2007 © A. Fedorova

Logging for Distributed Recovery

• Coordinator: forces “commit” decision to log before informing any participants – Like redo rule for single-site logging

• Participant: forces its vote (YES or NO) to disk before sending the vote to coordinator – – This way it knows that it must reach decision in agreement

with others• Participant: forces final decision (received from

coordinator) to the log, then responds to the coordinator– Once the coordinator receives responses from all

participants, it can remove its own decision log record

Page 21: Lecture XI: Distributed Transactions

21CMPT 401 Summer 2007 © A. Fedorova

Distributed Concurrency Control

• Multiple servers execute transactions, they share data distributed across sites

• A lock on data may be requested by many different servers• Distributed concurrency control methods:

– Centralized two-phase locking (C-2PL)– Distributed two-phase locking (D-2PL)– Optimistic concurrency control

Page 22: Lecture XI: Distributed Transactions

22CMPT 401 Summer 2007 © A. Fedorova

Distributed Concurrency Control: Notation

• SCH - global lock scheduler • Ti – a transaction• Oij(X) – an operation requiring a lock on X, executed at

server Sj as part of Ti• S1, S2, … - servers performing a distributed transaction• TM1, TM2, … - transaction managers – one for each server

Page 23: Lecture XI: Distributed Transactions

23CMPT 401 Summer 2007 © A. Fedorova

Centralized 2PL

• Let S1 be the server to which transaction Ti was submitted• Let S2 be the server maintaining data X• For each operation oi1(X), TM1 first requests the corresponding lock

from SCH (the central 2PL scheduler)• Once the lock is granted, the operation is forwarded to the server S2

maintaining X.

Page 24: Lecture XI: Distributed Transactions

24CMPT 401 Summer 2007 © A. Fedorova

Centralized 2PL

Read(x)

Release locks

Read(x)

TM1 SCH S2

Grant lock

Operation executed

...commit

Request lock

S1

Page 25: Lecture XI: Distributed Transactions

25CMPT 401 Summer 2007 © A. Fedorova

Distributed 2PL

• Each server has its own local 2PL scheduler SCH• Let S1 be the server to which transaction Ti was

submitted:– For each operation Oi1(X), TM1 forwards the operation to the TM2

of server S2 maintaining X– The remote site first acquires a lock on X and then submits the

execution of the operation.

Page 26: Lecture XI: Distributed Transactions

26CMPT 401 Summer 2007 © A. Fedorova

Distributed 2PL

Read(X)

Release locks

TM1 TM2 SCH of S2

Operation executed

...commit

Read(X)

S1 S2

Request lock Submit op

Page 27: Lecture XI: Distributed Transactions

27CMPT 401 Summer 2007 © A. Fedorova

Optimistic Concurrency Control

• Locking is conservative– Locking overhead even if no conflicts– Deadlock detection/resolution (especially problematic in

distributed environment)– Lock manager can fail independently

• Optimistic concurrency control– Perform operation first – Check for conflicts only later (e.g., at commit time)

• Centralized systems never used optimistic CC• Interesting alternative for distributed environment

Page 28: Lecture XI: Distributed Transactions

28CMPT 401 Summer 2007 © A. Fedorova

Optimistic Concurrency Control

• Working Phase:– If first operation on X, then load last committed version from DB and cache– Otherwise read/write cached version– Keep WriteSet containing objects written – Keep ReadSet containing objects read

• Validation Phase– Check whether transaction conflicts with other transactions

• Update Phase– Upon successful validation, cached version of updated objects are written back to

DB (= changes are made public)• Assumption (for simplicity):

– Only one transaction may be in validation phase and update phase at a time (e.g., implemented as critical section)

Page 29: Lecture XI: Distributed Transactions

29CMPT 401 Summer 2007 © A. Fedorova

Backward Validation: Working Phase

Maintain transaction counter Tn, initialize Tn := 0 Upon begin of transaction Ti (could be first read/write request)

StartTn(Ti) := Tn WriteSet(Ti) := ReadSet(Ti) := { }

Upon Read(X), Write(X) request of Ti If first operation on X, load last committed version from DB and

cache, Read/write cached version of X If Read(X), then ReadSet(Ti) := ReadSet(Ti) { X } If Write(X), then WriteSet(Ti) := WriteSet(Ti) { X }

Page 30: Lecture XI: Distributed Transactions

30CMPT 401 Summer 2007 © A. Fedorova

Backward Validation: Validation and Update Phases

At time of validation of transaction Ti For all j where StartTn(Ti) < j ≤ Tn

Let WriteSet(Tk) be the write set of transaction Tk with finishTn(Tk) = j

If WriteSet(Tk) ReadSet(Ti) { }, then abort Ti After validation (update)

Tn = Tn + 1 finishTn(Ti) = Tn Write WriteSet(Ti) to DB

Page 31: Lecture XI: Distributed Transactions

31CMPT 401 Summer 2007 © A. Fedorova

Optimistic Concurrency Control

Working Validation Update

T1

T2

T3

T5(active)

StartTN(T5)

finishTN(T1)

finishTN(T2)

finishTN(T3)

Tn = finishTN(T3)

Page 32: Lecture XI: Distributed Transactions

32CMPT 401 Summer 2007 © A. Fedorova

Optimistic Concurrency Control: Discussion

• Validation and Update Phase in Critical section– Reduced concurrency– Optimizations possible

• Space overhead:– To validate Ti, must have WriteSets for all Tj where

finishTn(Tj) > StartTn(Ti) and Tj was active when Ti began.– Each transaction must record read/write activity

• No deadlock but potentially more aborts

Page 33: Lecture XI: Distributed Transactions

33CMPT 401 Summer 2007 © A. Fedorova

Distributed Deadlock Resolution

• Similar remedies as for single-site deadlock resolution:– Prevention (lock ordering)– Avoidance (abort transaction that waits for too long)– Detection (maintain a wait-for graph, abort transactions involved

in a cycle)• Deadlock avoidance and detection require keeping

dependency graphs, or wait-for-graphs (WFGs)• WFGs are more difficult to construct in a distributed

system (takes more time, must use vector clocks or distributed snapshots)

• Deadlock managers can fail independently

Page 34: Lecture XI: Distributed Transactions

34CMPT 401 Summer 2007 © A. Fedorova

Summary• Atomic commitment

– Two-phase commit– Non-blocking implementation possible in a synchronous system

with reliable communication channel– Possible in an asynchronous system, but not guaranteed to

terminate (blocking)• Distributed recovery

– Keep state on stable storage– When reboot, ask around to recover the most current state

• Distributed concurrency control – Centralized lock manager– Distributed lock manager– Optimistic concurrency control