e -f performance evaluation for conflict resolution …/67531/metadc694258/...performance evaluation...

21
- - #g~/Dls/@-qa%, /sp- JUN 0 2 1997 OSTI Performance Evaluation for Conflict Resolution Transaction Management Approach I E - f Julia C. Lee* Lawrence J. Henschen Argonne National Laboratory Northwestern University Dept. of EECS 9700 S. Cas Ave., DIS/900 2145 Sheridan Rd. Argonne, IL 60439-4832 Evanston, IL 60208-3 103 lee @dis.anl.Sov henschen @ eecs.nwu.edu ABSTRACT We continue our previous study on the conflict resolution approach to transaction management. We compare both time and space performance of our approach with those of other transaction management models.We use mathematical abstraction and calculation for the comparison. Keywords: database, transactions, concurrency, performance. 1. Introduction In [Lee921 [HL94] m 9 5 A ] , and [HL95B], we presented a new transaction manage- ment model in which conflicts of the database operations among different transactions are resolved. In the present paper, we address some of the performance questions by compar- ing the performance of our approach with some of the other approaches, mainly with two phase-locking (2PL) and multiversion approaches. @3R92][SGS94]. Rather we use abstract mathematical calculation to compare the perfor- mances. Simulation methods may give a more intuitive view for the comparisons, but they normally lack generality. Simulation methods can run only on certain platforms and are restricted to only the transactions in the test set. Simulation methods may be appropriate for many other cases, but we think that a more general method is better in our case. Since *. The submitted manuscript has been authored by a contractor of the U.S. Government under con- tract No. W-3 1-109-ENG-38. Accordingly, the U.S. Government retains a nonexclusive, royalty- free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. We do not use simulation as means of comparison as other papers have done The submitted manuscript has been created by the University of Chicago as Operator of Argonne National Laboratory ("Argonne") under Conlract No. W-31-109-ENG-38 with the U.S. Department of Energy. The US. Government retains for itself, and others act- ing on its behalf, a paid-up. nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, dis- tribute copies to the public. and perform pub- licly and display publicly, by or on behalf of the Government. B OF wrs

Upload: others

Post on 10-Feb-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

- -

#g~/Dls/@-qa%,

/ s p -

JUN 0 2 1997 O S T I

Performance Evaluation for Conflict Resolution Transaction Management Approach

I E - f

Julia C. Lee* Lawrence J. Henschen Argonne National Laboratory Northwestern University Dept. of EECS 9700 S. C a s Ave., DIS/900 2145 Sheridan Rd. Argonne, IL 60439-4832 Evanston, IL 60208-3 103 lee @dis.anl.Sov henschen @ eecs.nwu.edu

ABSTRACT We continue our previous study on the conflict resolution approach to transaction management. We compare both time and space performance of our approach with those of other transaction management models.We use mathematical abstraction and calculation for the comparison.

Keywords: database, transactions, concurrency, performance.

1. Introduction

In [Lee921 [HL94] m 9 5 A ] , and [HL95B], we presented a new transaction manage- ment model in which conflicts of the database operations among different transactions are resolved. In the present paper, we address some of the performance questions by compar- ing the performance of our approach with some of the other approaches, mainly with two phase-locking (2PL) and multiversion approaches.

@3R92][SGS94]. Rather we use abstract mathematical calculation to compare the perfor- mances. Simulation methods may give a more intuitive view for the comparisons, but they normally lack generality. Simulation methods can run only on certain platforms and are restricted to only the transactions in the test set. Simulation methods may be appropriate for many other cases, but we think that a more general method is better in our case. Since

*. The submitted manuscript has been authored by a contractor of the U.S. Government under con- tract No. W-3 1-109-ENG-38. Accordingly, the U.S. Government retains a nonexclusive, royalty- free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes.

We do not use simulation as means of comparison as other papers have done

The submitted manuscript has been created by the University of Chicago as Operator of Argonne National Laboratory ("Argonne") under Conlract No. W-31-109-ENG-38 with the U.S. Department of Energy. The US. Government retains for itself, and others act- ing on its behalf, a paid-up. nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, dis- tribute copies to the public. and perform pub- licly and display publicly, by or on behalf of the Government.

B OF wrs

Page 2: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or use- fulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any spe- cific commercial product, process, or service by trade name, trademark, manufac- turer, or otherwise does not necessarily constitute or imply its endorsement, recorn- mendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Page 3: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence
Page 4: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

Performance Evaluation for Conflict Resolution Transaction Management Approach

Julia C. Lee* Argonne National Laboratory 9700 S. Cass Ave., DIS/900 Argonne, IL 60439-4832 [email protected]

Lawrence J. Henschen Northwestern University Dept. of EECS 2 145 Sheridan Rd. Evanston, IL 60208-3 103 hensc hen @eecs.nwu.edu

ABSTRACT We continue our previous study on the conflict resolution approach to transaction management. We compare both time and space performance of our approach with those of other transaction management models.We use mathematical abstraction and calculation for the comparison.

Keywords: database, transactions, concurrency, performance.

1. Introduction

In [Lee921 [HL94] [HL95A], and [HL95B], we presented a new transaction manage- ment model in which conflicts of the database operations among different transactions are resolved. In the present paper, we address some of the performance questions by compar- ing the performance of our approach with some of the other approaches, mainly with two phase-locking (2PL) and multiversion approaches.

[BR92][SGS94]. Rather we use abstract mathematical calculation to compare the perfor- mances. Simulation methods may give a more intuitive view for the comparisons, but they normally lack generality. Simulation methods can run only on certain platforms and are restricted to only the transactions in the test set. Simulation methods may be appropriate for many other cases, but we think that a more general method is better in our case. Since

We do not use simulation as means of comparison as other papers have done

*. The submitted manuscript has been authored by a contractor of the U.S. Government under con- tract No. W-3 1- 109-ENG-38. Accordingly, the U.S. Government retains a nonexclusive, royalty- free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes.

Page 5: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

theoretical models (or generalized models) are mathematical abstractions of possible implementations, using mathematical abstraction for comparing performance is both more ” general and more accurate. We argue that, for well-defined modeis/algorithms, appropriate abstraction and correct mathematical calculation are the more preferable methods for comparing performance: simulation can only provide corroborating results. Complexity of an algorithm is the most general and accurate measurement of the performance of an algo- rithm. Thus in our performance evaluation, we use instruction execution time and the number of instructions executed for a given size of input data stream - the complexity of the algorithm. The results of our comparative calculation, however, are more detailed than “order of complexity.” The results are functions of a set of variables. The impact of each variable on the performance, or difference of performances, can be analyzed by analyzing these functions. Our calculation uses some of the basic parameters about CPU instruction time and disk inpuvoutput (YO) time.

approach, including, two different versions. The basic concepts in this section may also be found in [HL94] [HL95A] [HL95B]. Section 3 presents a more detailed comparison of timing performance of our approach and the classical 2PL model. Section 4 presents a comparison of space performance of our approach with the 2PL BHG87][Ul188] and multiversion approaches [KS88][AK91]. Section 5 and 6 gives the closing remarks and acknowledgment.

The remainder of this paper is organized as follows. Section 2 is an overview of our

2. Overview of Conflict Resolution Methods

Due to space limitations, we do not repeat the basic definitions used by our model. Interested readers are referred to [HL94], [HL95A], and [HL95B]. In [HL95B], we intro- duced methods to resolve conflicts among sequences of standard operations (READ, INSERT, DELETE, CHANGE). Conflicts between a READ operation and preceding INSERTDELETEICHANGE operations (called local conflict) are handled by an O(n) algorithm, where n is the total number of operations in the sequence [HL94, HL95AI. We call the conflict among sequences of INSERT/DELETE/CHANGE global conflict: two algorithms from [Lee92][HL95B] are for handling global conflict. Global conflict affects the final database state, and its resolution requires modification of the original sequence of operations. Conflict in existing systems generally delays some operations, for example

Page 6: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

C a

waiting for ruples to become unlocked, thus affecting the overall performance. Our modi- fied operations are not delayed by locked tuples, so the question of increased performance naturally arises.

The basic global conflict resolution method uses a global buffer in high-speed mem- ory to simulate the effect of the sequence of operations. The algorithm follows (full details and examples are found in [HL95B].)

ALGORITHM: INPLT. ObTPUT:

a sequence 01 ... On of operations over a relation R I . n set of delete expressions 2. a rnodiJed sequence of operations over a bufSer B that

also generates exceptions for the delete expressions Let the number of DELETE and CHANGE operations in 01 ... On be my

and let Dl ... Dm be m expressions that are initially empty. Let these expressions correspond in order to the DELETE and CHANGE

operations in 0 1 ... On. Let there be a high-speed buffer B that is initially null. Let the CHANGE operations in 0 1 ... On be CHANGE R Si to Ti for i=l ... k. Execute (in parallel, if desired) the operations READ R Si to B for i=l ... k. For i= 1 ... n, do the following:

a. If Oi is INSERT R S, then execute INSERT B S and add S to the exception list of each delete expression corresponding to an Oj, jci.

(with no exceptions), where Dj is the delete expression corresponding to Oi.

c. If Oi is CHANGE R S to T, then execute CHANGE B S to T. Also, let Dj be S (with no exceptions), where Dj is the delete expression corresponding to Oi. Finally, tuples generated for T are added to the exception list for each D1, lei.

Insert B to R and apply all the delete-with-exception expressions to R (in parallel, if desired).

b. If Oi is DELETE R S, then let Dj be the expression S

The algorithm is O(n) where n is the number of operations in the sequence. This

Page 7: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

c _.

method translates a set of update operations into a set of DELETE operations and a set of buffer operations on a common buffer. The final contents of the buffer are inserted into the database, and the intermediate contents of the buffer are used to form the exception lists of the DELETE operations. These new operations are globally conflict free and can be exe- cuted in parallel.

A modification reduces the size of the buffer by determining the conflicting and non- conflicting parts of the operations in the sequence. For example, CHANGE R(a,b,x) to (a,c,x) and DELETE R(w,b,d) conflict obviously only at the tuple (a,b,d). This can be determined by a simple linear matching algorithm for each pair of operations in the sequence [HL95B]. Each Oi is split into two parts - the part Oi' that does not conflict with any other operation in the sequence and the part Oil' that potentially conflicts with at least one other operation. The sequence of Oi' operations can be performed concurrently. The algorithm above is then applied to the sequence Oi", whose sets of tuples are generally much smaller than for the original sequence of operations.

The translation methods do not incur a large performance degradation while provid- ing a significant possibility of concurrency. However, detailed comparisons of the perfor- mance are needed to support any claims of significant improvement. These comparisons are discussed in the following two sections.

3. Timing Performance Comparison

In this section, we compare the timing performance of our approach with that of the 2PL approach [BHG87][U1188][LKS9l][AAJ92]. We first state some of the basic facts related to the discussion or calculation. We then calculate the possible decrease in mean flow time and increase in throughput based on some generalization.

3.1 Basic facts for comparison

It is well known that the UO operations take much more time compared to main mem- ory operations. For example, according to manufacturer specifications a SUN SPARCsta- tion 5 has an instruction operation rate of 100.3 million instructions per second, while a (Seagate) disk drive in the SUN has an average seek time of 8.0 ms for READ and 9.0 ms for WRITE, an average latency time of 4.17 ms, a max burst transfer rate of 10-20 Mbytes/ s. Using these parameters and assuming that a memory access instruction takes three times

Page 8: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

c L

the average time of a machine instruction, assuming single channel YO, the disk random access rate can be calculated by the general formula in Wie831. The result of the calcula- tion tells us that the disk access time is 100,000 times the main memory access time!

Let’s look at another important factor for increasing performance. We all know that, for the uni-processor computer, all the machine instructions are exe-

cuted sequentially despite the fact that many processes can run on the same processor. Concurrency still can increase performance (in terms of throughput) because when one process is doing YO, other processes can use the CPU, so during the ?IO time of one pro- cess a large number of other processes can utilize the CPU. For this reason, concurrency (or multiprogramming) is important for a uni-processor computer [CD73] [PS83]. The general objective of the CPU time scheduling of operating systems of computers is to increase throughput and decrease mean flow time for the tasks involved [CD73].

the same time one task (or program) can be divided into many parallelized parts running on different processors of the same machine. In addition to the conventional tasks of an uni-processor operating system, the operating system of a multiprocessor machine needs to parallelize the programs in order to increase the total throughput and utilize the system resources; that is, the operating system needs to schedule the processors on top of schedul- ing the CPU time of each processor.

ment approach follows.

For multiprocessor machines, many processes can run on one of the processors, and at

The application of these principles and/or facts to the buffered transaction manage-

3.2 General examples

Locking is necessary for conventional transaction management systems to guarantee consistent resulting database states in a multitransaction system. However, locking can result in dead-lock. Some existing systems use a “dead-lock-resolving” mechanism to resolve dead-lock when it happens. This type of solution can not eliminate the “blocking” encountered by transactions when they acquire a data-item lock which is currently held by another transaction. The dead-lock-resolving approach needs to have a utility to detect dead-lock, which complicates the system, and to choose a “victim” transaction to abort. When a transaction is aborted, CPU time and UO time are needed for “clean up” of the system resources allocated to this transaction. This is a big performance drawback.

Page 9: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

1L .. Two-phase locking is a transaction management model that adopts “dead-lock-

avoiding” instead of ”dead-lock-resolving” in a multitransaction environment. A system that adopts 2PL as its lock-management (or transaction management) model guarantees that deadlock will not occur [U1188]. However, 2PL could result in a large decrease in per- formance in terms of mean flow time and throughput, both important factors for “long- duration” transactions.

Mean flow time and throughput are important performance parameters for a real mul- tiprogramming environment. According to [CD73], the mean flow time is the average flow time of a set of tasks. According to [PS83], “throughput is the amount of work which is accomplished in a given time interval (for example, 17 jobs per hour).” Our model in many cases decreases the mean ff ow time and improves the throughput of the multitransaction environment over the 2PL approach. A general example illustrates this fact:

Assume that three transactions T1, T2, and T3 entered the system at about the same time. Say T I contains I database operations, T2 contains m database operations, and T3 contains n database operations. The average execution time for each database operation is t. The total execution time (or “size”) of each transaction is:

T1 = I*t; T2 = m*t; T3 = n*t Suppose that b>m and b>n. Suppose also that T1 starts slightly earlier just enough to be granted the first lock and

if T2 happens to have one or more conflict data items with T1 and T3 happens to share one or more locks with T2. Then in a 2PL model T2 would have to wait until all the locks acquired by T1 have been released, which means wait until T1 finishes (the portion of T1 after releasing its locks can be ignored according to the facts stated in Section 3.1), and T3 needs to wait until T2 releases its locks. Assume also that each transaction consists of about half retrieval operations and half update operations ( p = updatekotal = 0.5). Retrieval operations need one database access, and update operations need two database accesses.

Page 10: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

J

Time

0 -- 1.519

The throughputs of the system for this case using 2PL approach are:

Throughput

0

1.5Z*t -- 1.5(Z+m) *t

I.j(Z+m) *t -- 1.5(l+rn+n)*t

>= I .5(Z+m+n)*r

1

2

3

In our approach (denoted as HL) the database operations for the transactions are mod- ified. However, it is still fairly easy to identify the transaction to which a particular opera-

Time

0 -- (I+m+l.5n)*t

(I+m+l.jn)*t -- (Z+1.5m+1.5n)*t

tion belongs (we may need to have a new field for the derived database operations, say a transaction number field). Therefore, we could schedule the parallelized operations after initial retrieval (READ operations) in such a way that the operations belonging to T3 orig- inally execute first. and the operations belonging to T2 execute next, and the operations belonging to T 1 originally execute last. The throughput for this example in HL approach is summarized as:

Throughput

0

1

Table 2: Throughput of the Three Transaction Example in HL Approach

Let I = 100, m= 10 and n=8, respectively. The throughputs of this example in two dif- ferent approaches can be depicted as the following figure.

One can see that at time (122*t) in the 2PL approach the throughput is 0, but in our approach the throughput is 1. At time (127*t) in the 2PL approach the throughput could still be 0, but in our approach the throughput is 2. HL approach out performs 2PL during the time of 1 2 2 to 165t. An obvious improvement in throughput has been achieved.

Page 11: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

Throughput for Z=lOO, m=10, n=8 case:

1111111111

(hidden line for 2PL)

1 1 1

(hidden line for 2PL)

122t 127t 150t 175t 177t (loot, 0) I

According to [Kob8 13 the mean throughput is “the average number of customers completed per unit time.” In the above example the mean throughput during the time period of 0 to l.S(m+n+l)*t:

Table 3: Mean Throughput Comparison for the Example

Mean Throughput General Formula k100, m=10, n=8, p = O S case

2PL (2n+m)/( Z+m+n) 0.22

I I HL 1 (Z+0.5m)/l.5(Z+rn-1-n) I 0.59

For the same general example for mean flow time under different approaches:

Table 4: Mean Flow Time Comparison for the Example

Mean Flow Time General Formula k100, m d 0 , n=8, p = O S case

2PL 0.5 (3 2+2m+n) 164 I I

HL (2m+2.5n+ 1.51))/3 63.3 I These examples are for illustration and intuition. The calculated results are given in

the following sections.

Page 12: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

0

3.3 Mean flow time

A more formal definition of the mean flow time according to [CD73] is:

where n is the number of completed tasks, and ti is the time for completing task i. The objective for increasing performance is to reduce the mean flow time.

Mean flow time is a more system point of view of performance than throughput is. When consider the mean flow time performance, our approach is in a optimal situation comparing to the 2PL approach. We state the following theorem related to mean flow time without proof. Interested readers are referred to [LH96].

resolution approach will at least perform the same as the 2PL approach, and the 2PL Theorem 3.1: When comparing the performance of the mean flow time, the conflict

approach will never out-perform the conflict resolution approach because m ( 2 P L ) - iMFT(HL) = p [(rn-l)(nl - n i l ) + (rn-2)(n2 - ni2) + ... +

(n(rn-1) - ni(rn-1) >Ilm is always a non-negative value where

. p is the ratio between average number of update operations to the total number of

(3.2)

database operations; . rn is total number of transactions involved; , n , n, , ... n, are the size of the transactions; . nil , ni2, ... ni, are the ordered n l n 2 , ... n, in increasing order, and . TS is the sum of n l , n2 ... n,. One can see from Theorem 3.2 that when comparing the mean flow time of the two

approaches, the conflict resolution approach gives promise in all possible cases.

3.4 Mean throughput

Let us define mean throughput more formally as:

(3.3)

For a more general result of comparing the mean throughput performance of our

Page 13: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

approach with the 2PL approach, let us state the following theorem without proof (refer to [LH96] for detailed proof and analysis.)

Theorem 3.2: Assuming that update operations take twice as much database access time than retrieval operations, the conflict resolution approach (HL) (without considering parallelization) may possibly out-perform the 2PL approach under the following condi- tions associated with the arriving clustered transactions.

(1) Transactions arrive at the system about the same time, but the times when

(2) There are large size transactions which arrive before the smaller

(3) Either the size difference is very big, or the number of transactions clustered is large and the smallest transaction is very small, or the number of update operations is large compared to the total number of database operations.

largest transaction is a determinative factor.

the system is notified of their arrival follow certain order.

transactions

(4) The difference of the size of the largest transaction and the second

The difference between MTP(HL) and MTP(2PL) can be expressed as the following equa- tion:

MTP(HL)-MTP(2PL)= p[O*(nil-nl) + 1*(ni2-n2) +...+ (m-1)* (nim - n,) ]/ (l+p )TS - [O*nl + l*n2 + .... + (m-l)*n,]/(l+p)TS

= PIO*((nil - n l ) + l*(ni2-n2) + ... + (m-l)*(ni,-n,)]/(l+p)TS - l/(l+p)TS - [O*n2 + 1*n3 .... i- (m-2)*nm]/(l+p)TS, (3.4)

where p , nl , n2 , ... n, nil , ni2 , ... ni, ; and TS are defined in Section 3.3. When comparing the mean throughput performances of the two approaches, our

approach is not optimal, but in some cases, our approach out performs the 2PL approach. These cases are the bottleneck cases for transaction management. More important, our approach allows parallelization, while 2PL approach does not. Parallelization opens a much wider door for increasing performance.

3.5 Multiprocessors and parallelization

Page 14: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

Y

As mentioned, a very important advantage of the conflict resolution approach is that it allows parallelization. The 2PL approach does not allow parallelization because of the possible conflicts. When parallelization can be applied to the transaction management sys- tem, the performance can be increased substantially.

Theorem 3.3: While 2PL approach cannot be applied to parallel database access because of the possible conflicts, the conflict resolution (HL) approach can be applied for parallel database access. In the case of P parallel processors with independent database access capabilities, the mean throughput of the conflict resolution approach (HL-P) has a large improvement. It can be represented with the following form when compared to the 2PL approach and to the case without parallelization (or P=l) for the same set of transac- tions during the same time period:

MTP(HL-P)-MTP(HL) = { (p/P) [O*nil + l*ni2 + 2*ni3 + ... + (m-1) ni,]/ (l+p)TS} + m*(P- 1)/P - (p) [O*ni , + 1*ni2 + 2*ni3 + ... + (m-1) ni,]/ (l+p )TS. (3.5)

Theorem 3.4: When parallelization is applied to the conflict resolution approach, the mean flow time can be reduced to a factor of 1/P, where P is the number of parallel proces- sors available; that is, MFT(HL-P) = MFT(HL)/P.

Theorem 3.4 can be derived directly from definition.

3.6 Time window for applying the approach

In the general example, we assumed that the three transactions arrived about the same time. In a more general case, the arrival of transactions could be in an arbitrary distribu- tion. Therefore, the database management system decides to let a single transaction go or to wait until other transaction(s) arrive to be grouped together. We propose a windowing scheme in which the first transaction waits for a predefined small “window” of time. If no other transaction comes within this window, the first transaction goes by itself. Otherwise, all the transactions arriving within the window are grouped together. This windowing technique also applies to long-lived transactions, like SAGAS [MS87].

Denote the window size as W and the arrival rate of the transactions as AR. The total transactions arrived during W are then rn = W*AR. Assume also that W cc TS, where TS is as defined previously. Let us also assume that l/ARcc ni for all i in (1,2, ..., m). We only consider the performance of the set of transactions within the given window. The

Page 15: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

.d

comparison results of mean flow time and mean throughput for 2PL and our approach are the following theorems. (We do not list the resulting formulas because they are very com- plex. The following results apply for the uni-processor case only.)

Theorem 3.5: When using a time window scheme and assuming the initial transac- tions wait for later ones without performing any transaction task, the throughput perfor- mance of the conflict resolution approach decreases. The chances that the conflict resolution approach will out perform the 2PL approach decreases accordingly.

Theorem 3.6: For a given size of time window and a given arrival rate for transac- tions and assuming that the early arrived transactions wait for the later ones without per- forming any transaction task, the 2PL approach is expected to out-perform the conflict resolution approach with respect to mean flow time with a boundary of (m-l)W/2m. How- ever, the conflict resolution approach can out-perform the 2PL approach without a bound-

ary. In a more realistic assumption the earlier arrival transaction can do the retrieval work

while waiting for the window time to expire. In this case the mean throughput and the mean flow time values will be exactly the same as the case without considering the win- dow since the window is overlapped by the execution of the transactions in conflict-resolu- tion approaches. This leads to the following theorem.

Theorem 3.7: When using a time window scheme and assuming the initial transac- tions perform retrieval operations while waiting for later transactions, the throughput per- formance and the mean flow time performance for both the conflict resolution approach and the 2PL approach are the same as they are when the time window scheme is not used, Therefore, the performance difference is also the same.

If 1/A > ni , W is also larger than ni . Obviously there is no need for concurrency. Even if a transaction goes by itself using our approach can still increase the performance due to the parallelization of the operations within that single transaction.

4. Space Performance Comparison

The question of space performance is also of concern. We have introduced a space reduction method to improve the performance of the proposed model [HL95A][HL95B], but a comparison of the space performance of our approach with 2PL and the Multiversion approach is still necessary for making a stronger argument for the possible applications of

Page 16: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

b ;d

our approach. One important fact that forms the basis for our argument is that all the data (or, more

specifically, tuples) affected by transactions are retrieved from the secondary memory (disks) into the main memory before they are either used or updated by transactions. The updated tuples are written back to the secondary memory by the database management system (DBMS) (or, more specifically, memory management of operating system). There- fore, there are actually two issues about space performance: (1) the main memory cache required to store the retrieved data pages, and (2) the additional buffer space used for inter- nal calculations.

In case 1, our approach performs at least as well as the 2PL approach under the same memory management system, and the multiversion approach [KS88], however, has the worst space performance, since it must to keep a version of updated tuples indefinitely. While we do not provide any detailed calculation or proof in this paper, a very high level view with basic set theory tells this intuitively. Our approach requires that all the retrievals are performed at the beginning of the transaction executions. Therefore, overlapped (or conflicting) tuple sets can be retrieved at the same time, and this saves space because / A u B( I IAI + 1231 , where A and B are sets. (Note that the tuples retrieved at the begin- ning of the transaction do not need to stay in the memory all at one time since the CPU can still work on the tuples retrieved. After the tuples have been used, they can be swapped out when additional space is needed.)

In case 2, if the cached data pages can be addressed directly by the transactions (pro- grams) in the implementation, our approach will not need any additional buffer space other than the ones used in a normal transaction program plus some space to hold the exception expressions created for the new DELETE operations. In some cases, it may use less buffer space for the same reason (basic set theory), since we use one buffer for all the conflicting tuples.

The “buffer-saving’’ method introduced in [KL95A] and [HL95B] may need more buffer space to keep the exception-expressions, but at the same time it substantially reduces the number of tuples that need to be loaded at the beginning of the process since many of the nonconflicting tuples can be loaded as needed. This method also permits refining the buffer usage of the conflict-resolution approach. This idea can be used in fur- ther study on other issues of the conflict-resolution approach.

Page 17: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

8

Retrieving the affected tuples at the same stage results not only in possible space sav- ings, but also in I/O performance improvement because entire data pages are brought into the main memory by the disk YO management (or page management) rather than just those tuples affected. Therefore, if tuple sets A and B both reside in data page 1002, two disk accesses are needed to bring data page 1002 in if set A and set B are retrieved sepa- rately. However, only one disk YO is needed if they are retrieved at the same time.

5. Remarks

This paper has continued the discussion of some major results regarding performance of the conflict resolution transaction management method. The conclusion is that if the involved transactions could provide their operations steps or program code at the begin- ning of their execution, our model (or the conflict resolution approach) will outperform, or equal, the 2PL approach under all the possible conditions stated in this paper with respect to mean flow time. Our model also outperforms the 2PL approach under certain, more restricted conditions with respect to mean throughput, that is, the bottleneck conditions which make the 2PL approach undesirable. For example a large size (or long-lived) trans- action blocks the small transactions, or a large amount of update operations are required, while some other smaller or retrieval-type transactions are waiting. More important, the conflict resolution approach provides the possibility of parallelization, while the 2PL approach can not.

compared to the I/O time. Moreover, our results do not include any “locking overhead” encountered by 2PL.

sion approaches. Additional internal program buffer spaces are needed for holding the modified DELETE with exceptions operations.

As we know, the classic 2PL provides almost no concurrency, while other approaches either put different restrictions on the database systems, the execution of the transactions, the possible operation steps in the involved transactions, or put additional responsibilities on the users [Gar83][Lyn83][AK91][TSP92][WA92][SGS94]. We only put one restriction (provide code at the beginning) on the transactions, which is acceptable in most applica- tions. We do not restrict the orders of operations or possible operation steps on transac-

Our results do not consider the improvement on utilizing CPU time because it is small

Cache space requirements are reduced in general compared to the 2PL and multiver-

Page 18: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

4 4 3

tions. We did not put any additional load to the users (or transaction initiators). At the same time, we provide a large possibility of concurrency andor parallelization. This could open large application areas for our approach to be adopted.

We have discussed clustered transactions (with or without window) in this paper. The “pipelined” situation will be the subject of future study.

6. Acknowledgment

Many thanks to Dr. L. Russell for his valuable suggestions and comments.

References

[AM921 D. Agrawal, A. El Abbadi, R. Jeffers; Using Delayed Commitment in Locking Protocols for Real-time Databases; Proceedings of the 1992 ACM SIGMOD; San Diego, California, June 2-5, 1992; pages 104-1 13.

[AK9 11 D. Agrawal, V. Krishnaswamy; Using Multiversion Data for Non-interfering Execution of Write-Only Transactions; Proceeding of the 1992 ACM SIGMOD; Denver, Colorado, May 29-3 1, 199 1 pages 98- 107.

[BHG87] P.A. Bernstein, V. Hadzilacos, N. Goodman; Serializability Theory; Concurrency Control and Recovery in Database Systems; Addison-Wesley Publishing Company, pages 1-45; 1987.

[BR92] B.R. Badrinath and Krithi Ramamritham; Semantics-based Concurrency Control beyond Commutativity; ACM Transactions on Databases; ACM Vol. 17, No. 1 , March 1992, pages 163-199.

Page 19: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

[CD73] E.G. Coffman, JR. & P.J. Denning Chapters 1 and 3 Operating Systems Theory; Prentice-Hall Series in Automatic Computation; Prentice-Hall; pages 1-30 and 83-143; 1973.

[Gar833 Hector Garcia-Molina; Using Semantic Knowledge for Transaction Processing in a Distributed Database; ACM Transaction on Database Systems; ACM TODS June 1983; pages 186-213.

[HL941. Lawrence J. Henschen and Julia C. Lee; A Highly Concurrent Transaction Management Model; Proceedings of ICCI’94 International Conference on Computing & Information; pages 1426-1441; 1994.

[HL95A] Lawrence J. Henschen and Julia C. Lee; Buffer Reduction in an Attribute-Based Concurrent Transaction Processing System; Proceedings of ICCI’95 International Conference on Computing & Information; pages546-559; 1995.

[HL95B] Lawrence J. Henschen and Julia C. Lee Resolving Conflict - A New Approach to Transaction Management Model Manuscript; 1995.

[Kob8 11 H. Kobayashi Modeling and Analysis: An Introduction to System Performance Evaluation Methodology; Addison-Wesley Publishing Company; pages 1 16; Oct. 198 1.

Page 20: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

* . ) a

[KS88] Henry E Korth and Gregory D. Speegle; Formal iModel of Correctness Without Serializability; Proceedings of SIGMOD International Conference on Management of Data; ACM, Chicago IL., June 1-3; pages 379-386; 1988.

bKS9 11 Eliezer Levy, Henry E Korth and Abraham Si1 An Optimistic Commit Protocol for Distributed Transaction Management; Proceeding of the 1992 ACM SIGMOD; Denver, Colorado May 29-3 1 , 199 1; pages 88-97.

[Lee921 Julia C. Lee; A Transaction Management Model with Relational Database Operation Semantics; Ph.D Dissertation Northwestern University; Evanston Ill.; Dec. 1992.

~ ~ ~ 9 6 1 Julia C. Lee and Lawrence J. Henschen Performance Comparison of Transaction management Models Manuscript; 1996.

[LYnW Nancy A. Lynch; Multilevel Atomicity-A new Correctness Criterion for Database Concurrency Control; ACM Transactions on Database Systems; ACM TODS, Dec. 1983; pages 485-502.

[MS87] Hector Garcia-Molina and Kenneth Salem; SAGAS; Proceedings of SIGMOD,; ACM, San Francisco, CA. May 27, 1987; page: 24

[Psi331 J. Peterson and A. Silberschatz

Page 21: E -f Performance Evaluation for Conflict Resolution …/67531/metadc694258/...Performance Evaluation for Conflict Resolution Transaction Management Approach I E -f Julia C. Lee* Lawrence

Chapter 1 - 5 and Chapter 9 Operating System Concepts; Addison Wesley; pages 1-1 88; 1983.

[SGS94] Kenneth Salem, Hector Garcia-Molina, and Jeannie Shands Altruistic Locking; ACM Transaction on Database Systems;

Vol. 19, No. 1, March 1994; pages 117-165.

[TSP92] John Turek, Dennis Shasha, and Sundeep Praka Locking without Blocking - Making Lock Based Concurrent Data Structure Algorithms Nonblocking; Proceeding of Principles of Database Systems; ACM San Diego, CAY June; pages 212-222; 1992.

[Ull88] Jeffrey D. Ullman; Chapter 9 Transaction Management; Database and Knowledge-base Systems; Computer Science Press; pages 467-542; 1988.

1 ~ ~ 9 2 1 M.H. Wong and D. Agrawal; Tolerating Bounded Inconsistency for Increasing Concurrency in Database Systems; Proceedings of ACM 1 1 th Principles of Database Systems; San Diego, CA., 1992.; pages 236-245; 1992.

pNieS31 Gio Wiederhold Chapter 2, Hardware and Its Parameters; Database Design; Computer Science Series, McGraw Hill; pages 27-72; 1983.