adbms-transaction processing and con currency control renuka pawar

Post on 21-Apr-2015

157 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

By Renuka Pawar Guide – Prof Kailas K. Devadkar1Transaction Processing• • • • Transaction Processing Characterizing Schedules based on Recoverability Characterizing Schedules based on Serializability University QuestionsConcurrency Control• • • • • Purpose of Concurrency Control Two-Phase locking Timestamp based concurrency control Validation (Optimistic) Concurrency Control Schemes University Questions2FIGURE 1 Interleaved processing versus parallel processing of concurrent transact

TRANSCRIPT

By Renuka Pawar

Guide – Prof Kailas K. Devadkar 1

2

FIGURE 1 Interleaved processing versus parallel processing of concurrent transactions.

3

A Transaction: logical unit includes one or more access operations (read -retrieval, write - insert or update, delete).

A transaction (set of operations): May be stand-alone or may be embedded within a program.

Transaction boundaries: Begin and End transaction.

Basic operations are read and write

4

FIGURE 2 FIGURE 2 Two sample transactions. Two sample transactions. (a) Transaction (a) Transaction TT11. .

(b) Transaction (b) Transaction TT22..

5

Why Concurrency Control is needed:

The Lost Update Problem. The Temporary Update (or Dirty Read)

Problem. The Incorrect Summary Problem .

6

FIGURE 3FIGURE 3Some problems that occur when concurrent execution is Some problems that occur when concurrent execution is uncontrolled. (a) The lost update problem. uncontrolled. (a) The lost update problem.

X =80N=5(T1 transfer 5 from X to Y)M=4(t2 reserves 4 on X)

X should be 79 bt X=84 bcoz update in T1 that remove 5 seats from X was lost

7

FIGURE 3 (continued)FIGURE 3 (continued)Some problems that occur when concurrent Some problems that occur when concurrent execution is uncontrolled.execution is uncontrolled. (b) The temporary update (or Dirty Read) problem. (b) The temporary update (or Dirty Read) problem.

one transaction update and thn fail for some reasonHere T1 fails b4 completion

8

FIGURE 3 (continued)FIGURE 3 (continued) Some problems that occur when concurrent execution is Some problems that occur when concurrent execution is uncontrolled. (c) The incorrect summary problem.uncontrolled. (c) The incorrect summary problem.

9

Why recovery is needed: (What causes a Transaction to fail)

1. A computer failure (system crash)2. A transaction or system error3. Local errors or exception conditions

detected by the transaction4. Concurrency control enforcement5. Disk failure6. Physical problems and catastrophes

10

A transaction is an atomic unit of work

Transaction states: Active state Partially committed state Committed state Failed state Terminated State

11

Recovery manager keeps track of the following operations:

begin_transaction read or write end_transaction commit_transaction rollback (or abort)

12

Recovery techniques use the following operators: undo: Similar to rollback except it

applies to a single operation. redo: This specifies that certain

transaction operations must be redone to ensure that all the operations of a committed transaction have been applied successfully to the database.

13

FIGURE 4FIGURE 4State transition diagram illustrating the states for State transition diagram illustrating the states for transaction execution.transaction execution.

14

The System Log to recover from failures that affect transactions

Log or Journal

T is unique transaction-id that is generated automatically by the system and is used to identify each transaction

15

The System Log (cont):

Types of log record:

1. [start_transaction,T]: Records that transaction T has started execution.

2. [write_item,T,X,old_value,new_value]3. [read_item,T,X]4. [commit,T]5. [abort,T]

16

Desirable Properties of Transactions Desirable Properties of Transactions

ACID properties:

Atomicity: all or nothing

Consistency preservation: A correct execution of the transaction must take the database from one consistent state to another.

Isolation: updates should be invisible until it is committed;

Durability or permanency: committed changes never be lost

17

Schedules of TransactionsSchedules of Transactions

Schedule(or History) S of n transactions T1,T2,..,Tn is an ordering of operations:– the operations order must be same – can be interleaved with the operations .

For example :

Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);

Sb: r1(X); w1(X); r2(X); w2(X); r1(Y); a1;

The symbols r,w,c, and a are used for the operations read_item, write_item, commit, and abort respectively.

2. Characterizing Schedules based on Recoverability

18

Schedules of Transactions (continued)Schedules of Transactions (continued)

Two operations are said to be conflict if:– they belong to different transactions – they access the same item X– at least one the operations is a write_item(X)

Ex:

S1: r1(x);r2(x);w1(x);r1(y);w2(x);w1(y);

conflicts: [r1(x);w2(x)] – [r2(x);w1(x)] – [w1(x); w2(x)]

19

Characterizing Schedules based on Characterizing Schedules based on RecoverabilityRecoverability

Type of schedules :– recoverable scheduls : once a transaction T is

commited , it should never rollbacked.

S’a: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;

is recoverable, even though it suffers from the lost update problem.

i.e. If no transaction T in S commits until all transactions T’ that have written an item that T reads have committed.

It is possible for Cascading rollback to occur when an uncommitted transaction has to be rolled back.

20

Example -- Consider the following schedules:Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;

Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;

Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2;

Sc is not recoverable, because T2 reads item X written by T1, and then T2 commits before T1 commits.

To make Sc recoverable, c2 of Sc must be postponed until after T1 commits as follows:

Sd is recoverable, because T2 reads item X written by T1, and then T2 commits after T1 commits and no other conflict operations exists in the schedule.

Se is similar to Sd, but T1 is aborted (instead of commit).

In this situation T2 should also abort, because the value of X it read is no longer valid (this phenomenon known as cascading rollback or cascading abort).

21

Because cascading rollback can be quite time-consuming, we perform cascadeless schedules.

A schedule is said to be cascadeless or to avoid cascading rollback if every transaction in the schedule reads only items that were written by committed transactions.

A more restrictive type of schedule is called strict schedule, in which transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted)

22

3. Serializability of Schedules3. Serializability of SchedulesSerial, Nonserial, and Conflict-Serializable schedulesSerial, Nonserial, and Conflict-Serializable schedules

FIGURE 5 Examples of serial and nonserial schedules involving transactions T1 and T2.

(a) Serial schedule A: T1 followed by T2.

(b) Serial schedules B: T2 followed by T1.

23

The problems of serial schedules :– they limit concurrency or interleaving of operations

if a transaction waits for an I/O operation to complete, we cannot switch the CPU Processor to another transaction

if some transaction T is long , the other transactions must wait for T to complete all its operations.

24

Serial, Nonserial, and Conflict-Serializable Serial, Nonserial, and Conflict-Serializable schedules (continued)schedules (continued)

A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions.

Two schedules are called result equivalent if they produce the same final state of the database.

Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules

schedule S is conflict serializable if it is conflict equivalent to some serial schedule S’.

25

Uses of serializabilityUses of serializability

Note that: being Serializable is distinct from being Serial.

A Serial Schedule leads to inefficient utilization of CPU because of no interleaving of operations from different transactions.

A Serializable Schedule gives the benefits of concurrent execution without giving up any correctness

26

Serializability is hard to check

– Interleaving of operations occurs in an operating system through some scheduler

– Difficult to determine beforehand how the operations in a schedule will be interleaved.

27

Testing for conflict serializability

Algorithm 1: 1. Looks at only read_Item (X) and write_Item (X) operations

2. Constructs a precedence graph (serialization graph) - a graph with directed edges

3. An edge is created from Ti to Tj if one of the operations in Ti

appears before a conflicting operation in Tj

4. The schedule is serializable if and only if the precedence graph

has no cycles.

28

Algorithm for testing conflict Serializability of S:

1) For each transaction Ti in schedule S, create a node,

2) If Tj executes a READ_ITEM(X) after Ti executes a WRITE_ITEM(X), create an edge Ti Tj

3) If Tj executes a WRITE_ITEM(X) after Ti executes a READ_ITEM(X), create an edge Ti Tj

4) If Tj executes a WRITE_ITEM(X) after Ti executes a WRITE_ITEM(X), create an edge Ti Tj

5) The schedule S is serializable if and only if the precedence graph has no cycle.

Characterizing Schedules Based on SerializabilityCharacterizing Schedules Based on Serializability

29

Q. Consider three transaction T1,T2,T3 and the schedule S1 and S2 given below

T1: r1(x);w1(x);r1(y);w1(y)

T2: r1(z);r2(y);w2(y);r2(x);w2(x)

T3: r3(y);r3(z);w3(y);w3(z)

S1: r2(z);r2(y);w2(y); r3(y);r3(z); r1(x);w1(x);w3(y);w3(z);r2(x); r1(y);w1(y); w2(x)

S2: r3(y);r3(z);r1(x); w1(x);w3(y);w3(z);r2(z); r1(y);w1(y);r2(y); w2(y);r2(x);w2(x)

Draw the serializability graph for S1 and S2 and state whether each schedule is serializable or not. If serializable , write the equivalent serial schedule(s)

What are the steps used to find equivalent serial schedule

30

FIGURE 8FIGURE 8Another example of serializability testing. Another example of serializability testing. (a) The READ and WRITE operations of three transactions (a) The READ and WRITE operations of three transactions TT11, , TT22, and , and TT33. .

31

FIGURE 8 (continued)FIGURE 8 (continued)Another example of serializability testing. (b) Schedule Another example of serializability testing. (b) Schedule EE..

32

FIGURE 8 (continued)FIGURE 8 (continued)Another example of serializability testing. (c) Schedule Another example of serializability testing. (c) Schedule FF..

33

34

1. Purpose of Concurrency Control

2. Two-Phase locking

3. Timestamp based concurrency control

4. Validation (Optimistic) Concurrency Control Schemes

35

• To enforce Isolation among conflicting transactions.

• To preserve database consistency

• To resolve read-write and write-write conflicts.

36

In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits.

37

Two-Phase Locking TechniquesTwo-Phase Locking Techniques

Lock is an operation which secures

(a) permission to Read or

(b) permission to Write

Example: Lock (X) and Unlock (X)

38

Two locks modes (a) shared (read) and (b) exclusive (write).

Shared mode: shared lock (X). More than one transaction can apply share lock on X for its value but no write lock can be applied on X by any other transaction.

Exclusive mode: Write lock (X). Only one write lock on X can exist at any time and no shared lock can be applied by any other transaction on X.

Conflict/Compatibility matrix

Read Write

Read W

riteN

NN

Y

2PL Techniques: E2PL Techniques: Essential componentsssential components

39

2PL Techniques: Essential components cont..

Lock Manager: Managing locks on data items.

Lock table: Lock manager uses it to store the identify of transaction locking a data item, the data item, lock mode and pointer to the next data item locked.

One simple way to implement a lock table is through linked list.

T1Transaction ID Data item id lock mode Ptr to next data item

NextX1 Read

40

Database Concurrency ControlDatabase Concurrency Control

Rule for locking:

• It must lock the data item before it reads or writes to it.

• It must not lock an already locked data items and it must not try to unlock a free data item.

41

The following code performs the lock operation:

B: if LOCK (X) = 0 (*item is unlocked*)

then LOCK (X) 1 (*lock the item*)

else begin

wait (until lock (X) = 0) and

the lock manager wakes up the transaction);

goto B

end;

The following code performs the unlock operation:

LOCK (X) 0 (*unlock the item*)

if any transactions are waiting then

wake up one of the waiting the transactions;

43

Lock conversion

Lock upgrade: existing read lock to write lockif Ti has a read-lock (X) and Tj has no read-lock (X) (i j) then

convert read-lock (X) to write-lock (X) else

force Ti to wait until Tj unlocks X

Lock downgrade: existing write lock to read lock

Ti has a write-lock (X) (*no transaction can have any lock on X*) convert write-lock (X) to read-lock (X)

44

Two-Phase Locking Techniques: The algorithm

Two Phases: (a) Locking (Growing) (b) Unlocking (Shrinking).

Locking (Growing) Phase: A transaction applies locks (read or write) on desired data items one at a time.

Unlocking (Shrinking) Phase: A transaction unlocks its locked data items one at a time.

Requirement: For a transaction these two phases must be mutually exclusively, that is, during locking phase unlocking phase must not start and during unlocking phase locking phase must not begin.

45

Timestamp based concurrency control algorithm

Timestamp (transaction start time)

A monotonically increasing variable (integer) indicating the age of an operation or a transaction. A larger timestamp value indicates a more recent event or operation.

Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions.

46

Basic Timestamp Ordering

1. Transaction T issues a write_item(X) operation:

a. If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then an younger transaction has already read the data item so abort and roll-back T and reject the operation.

b. If the condition in part (a) does not exist, then execute write_item(X) of T and set write_TS(X) to TS(T).

2. Transaction T issues a read_item(X) operation:

a. If write_TS(X) > TS(T), then an younger transaction has already written to the data item so abort and roll-back T and reject the operation.

b. If write_TS(X) TS(T), then execute read_item(X) of T and set read_TS(X) to the larger of TS(T) and the current read_TS(X).

47

Strict Timestamp Ordering

1. Transaction T issues a write_item(X) operation:

If TS(T) > read_TS(X), then delay T until the transaction T’ that wrote or read X has terminated (committed or aborted).

2. Transaction T issues a read_item(X) operation:

If TS(T) > write_TS(X), then delay T until the transaction T’ that wrote or read X has terminated (committed or aborted).

48

Thomas’s Write Rule1. If read_TS(X) > TS(T) then abort and roll-back T and reject the

operation.

2. If write_TS(X) > TS(T), then just ignore the write operation and continue execution. This is because the most recent writes counts in case of two consecutive writes.

3. If the conditions given in 1 and 2 above do not occur, then execute write_item(X) of T and set write_TS(X) to TS(T).

49

Validation (Optimistic) Concurrency Control Schemes

In this technique only at the time of commit serializability is checked and transactions are aborted in case of non-serializable schedules.

Three phases:

1. Read phase:

A transaction can read values of committed data items. However, updates are applied only to local copies (versions) of the data items (in database cache).

50

2. Validation phase:

Serializability is checked before transactions write their updates to the database.

1. Tj completes its write phase before Ti starts its read phase.

2. Ti starts its write phase after Tj completes its write phase, and the read_set of Ti has no items in common with the write_set of Tj

3. Both the read_set and write_set of Ti have no items in common with the write_set of Tj, and Tj completes its ead phase.

51

3. Write phase:

On a successful validation transactions’ updates are applied to the database; otherwise, transactions are restarted

52

ReliabilityReliability

53

A reliable DDBMS is one that can continue to process user requests even when the underlying system is unreliable, i.e., failures occur

• Failures– Transaction failures– System (site) failures, e.g., system crash, power supply failure– Media failures, e.g., hard disk failures– Communication failures, e.g., lost/undeliverable messages

• Reliability is closely related to the problem of how to maintain the atomicity and durability properties of transactions

Distributed Reliability ProtocolsDistributed Reliability Protocols

54

• Issues in a distributed transaction are commit, termination, and recovery

– Commit protocols How to execute a commit command for distributed transactions How to ensure atomicity (and durability)?

– Termination protocols If a failure occurs at a site, how can the other operational sites deal with it Non-blocking: the occurrence of failures should not force the sites to wait until the failure is repaired to terminate the transaction

Distributed Reliability ProtocolsDistributed Reliability Protocols

55

– Recovery protocols

When a failure occurs, how do the sites where the failure occurred deal with it Independent: a failed site can determine the outcome of a transaction without having to obtain remote information

Commit ProtocolsCommit Protocols

56

Primary requirement of commit protocols is that they maintain the atomicity of distributed transactions (atomic commitment)

– i.e., even though the exectution of the distributed transaction involves multiple sites, some of which might fail while executing, the effects of the transaction on the distributed DB is all-or-nothing.

57

• In the following we distinguish two roles

– Coordinator: The process at the site where the transaction originates and which controls the execution

– Participant: The process at the other sites that participate in executing the transaction

Centralized Two Phase Commit Protocol (2PC)Centralized Two Phase Commit Protocol (2PC)

58

• Phase 1: The coordinator gets the participants ready to write the results into the database

• Phase 2: Everybody writes the results into the database

• Global Commit Rule– The coordinator aborts a transaction at least one participant votes to abort it– The coordinator commits a transaction if all of the participants vote to commit it

• Centralized since communication is only between coordinator and the participants

2PC Protocol and Site Failures2PC Protocol and Site Failures

Site failures in the 2PC protocol might lead to timeouts

• Timeouts are served by termination protocols

59

• Coordinator timeouts:

One of the participants is down.

Depending on the state, the coordinator can take the following actions:

– Timeout in INITIAL Do nothing

– Timeout in WAIT Coordinator is waiting for local decisions Cannot unilaterally commit Can unilaterally abort and send an appropriate message to all participants

– Timeout in ABORT or COMMIT Stay blocked and wait for the acks (indefinitely,if the site is down indefinitely)

60

Participant timeouts:

The coordinator site is down.

A participant site is in

– Timeout in INITIAL Participant waits for “prepare”, thus coordinator must have failed in INITIAL state Participant can unilaterally abort

– Timeout in READY Participant has voted to commit, but does not know the global decision Participant stays blocked (indefinitely, if the coordinator is permanently down), since

participant cannot change its vote or unilaterally decide to commit

61

Coordinator site failure: Upon recovery, it takes the following actions:

– Failure in INITIAL Start the commit process upon recovery (since coordinator did not send anything to the

sites)

– Failure in WAIT Restart the commit process upon recovery (by sending “prepare” again to the

participants)

– Failure in ABORT or COMMIT Nothing special if all the acks have been received from participants Otherwise the termination protocol is involved(re-ask the acks)

62

Participant site failure: The coordinator sites recovers

– Failure in INITIAL Unilaterally abort upon recovery as the coordinator will eventually

timeout since it will not receive the participant’s decision due to the failure

– Failure in READY The coordinator has been informed about the local decision Treat as timeout in READY state and invoke the termination protocol

(re-ask the status)

– Failure in ABORT or COMMIT Nothing special needs to be done

63

Problems with 2PC ProtocolProblems with 2PC Protocol

– A protocol is non-blocking if it permits a transaction to terminate at the operational sites without waiting for recovery of the failed site.

Significantly improves the response-time of transactions

– 2PC protocol is blocking Ready implies that the participant waits for the coordinator If coordinator fails, site is blocked until recovery; independent

recovery is not possible The problem is that sites might be in both: commit and abort phases.

64

Three Phase Commit Protocol (3PC)Three Phase Commit Protocol (3PC)

3PC is a non-blocking protocol when failures are restricted to single site failures

65

2PC protocol first gets participants ready for the transaction (phase 1), and then asks the participants to write the transaction (phase 2). 2PC is a blocking protocol.

3PC first gets participants ready for the transaction (phase 1), pre-commits/aborts the transaction (phase 2), and then asks the participants to commit/abort the transaction (phase 3).

3PC is non-blocking.

66

top related