database system applications

Upload: aniket-mitra

Post on 26-Feb-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Database System Applications

    1/43

    Advanced DBMSNotesAnanya Banerjee

    2015

    D. A. I. T. M.

    07/11/2015

  • 7/25/2019 Database System Applications

    2/43

    D. A. I. T. M.| Ananya Banerjee

    1Relational Database Management

    Table of Contents

    A. Relational Database Management ......................................................................................................... 4

    1. Database System Applications .......................................................................................................... 4

    2. Database Systems versus File Systems ............................................................................................. 4

    3. View of Data ..................................................................................................................................... 6

    4. Data Abstraction ............................................................................................................................... 6

    5. Instances and Schemas ...................................................................................................................... 7

    6. Database Languages .......................................................................................................................... 7

    7. Data-Definition Language................................................................................................................. 7

    8. Data-Manipulation Language ........................................................................................................... 8

    9. Database Users and Administrators .................................................................................................. 8

    10. Database Administrator................................................................................................................. 9

    11. Keys .............................................................................................................................................. 9

    12. Triggers ....................................................................................................................................... 10

    13. Need for Triggers ........................................................................................................................ 10

    14. Triggers in SQL ......................................................................................................................... 11

    B. Entity-Relationship Diagram .............................................................................................................. 12

    1. Extended E-R Features.................................................................................................................... 12

    2. Specialization .................................................................................................................................. 13

    3. Generalization ................................................................................................................................. 13

    C. The Relational Algebra ....................................................................................................................... 15

    1. The Select Operation ....................................................................................................................... 15

    2. The Project Operation ..................................................................................................................... 15

    3. Composition of Relational Operations ............................................................................................ 15

    4. The Union Operation ...................................................................................................................... 15

    5. The Set Difference Operation ......................................................................................................... 16

    6. The Cartesian-Product Operation .................................................................................................... 16

    7. The Rename Operation ................................................................................................................... 16

    8. The Natural-Join Operation ............................................................................................................ 16

    9. The Division Operation ................................................................................................................... 16

    10. Outer Join .................................................................................................................................... 17

    11. The Tuple Relational Calculus .................................................................................................... 17

  • 7/25/2019 Database System Applications

    3/43

    D. A. I. T. M. | Ananya Banerjee

    2 Advanced DBMS Notes

    12. Example Queries ......................................................................................................................... 17

    13. Referential Integrity .................................................................................................................... 18

    D. Normalization ..................................................................................................................................... 19

    1. Functional dependency ................................................................................................................... 19

    2. First Normal Form .......................................................................................................................... 19

    3. Second Normal Form ...................................................................................................................... 19

    4. Third Normal Form ......................................................................................................................... 20

    5. BOYCE-CODD Normal Form........................................................................................................ 21

    6. Lossless-Join Decomposition .......................................................................................................... 22

    7. Multivalued Dependencies and Fourth Normal Form .................................................................... 23

    8. Join Dependencies and Fifth Normal Form .................................................................................... 23

    E. Transaction Concept ........................................................................................................................... 24

    1. ACID Property ................................................................................................................................ 24

    2. Transaction State ............................................................................................................................. 24

    3. Serializability .................................................................................................................................. 24

    4. Conflict Serializability .................................................................................................................... 25

    5. View Serializability ........................................................................................................................ 26

    6. Testing for Serializability................................................................................................................ 27

    7. Lock-Based Protocols ..................................................................................................................... 27

    8. Locks ............................................................................................................................................... 27

    9. The Two-Phase Locking Protocol ................................................................................................... 28

    10. Deadlock Handling ..................................................................................................................... 28

    11. Deadlock Prevention ................................................................................................................... 29

    12. Deadlock Detection and Recovery .............................................................................................. 29

    F. Database Tuning ................................................................................................................................. 31

    G. Database Security and Authorization .................................................................................................. 32

    1. Types of Security ............................................................................................................................ 32

    2. Threats to Databases ....................................................................................................................... 323. Control Measures ............................................................................................................................ 33

    4. Database Security and the DBA ...................................................................................................... 33

    H. Multimedia Databases ......................................................................................................................... 34

    1. The Nature of Multimedia Data and Applications .......................................................................... 34

    2. Data Management Issues ................................................................................................................ 35

  • 7/25/2019 Database System Applications

    4/43

    D. A. I. T. M.| Ananya Banerjee

    3Relational Database Management

    3. Multimedia Database Applications ................................................................................................. 35

    I. Object-Oriented Databases ................................................................................................................. 37

    1. Motivation ....................................................................................................................................... 37

    2. Concept & Features ......................................................................................................................... 37

    3. Mandatory features of object-oriented systems .............................................................................. 38

    4. Mandatory features of database systems ......................................................................................... 39

    5. Making OOPL a Database .............................................................................................................. 40

    6. Comparisons of OODBS & RDBS ................................................................................................ 42

  • 7/25/2019 Database System Applications

    5/43

    D. A. I. T. M. | Ananya Banerjee

    4 Advanced DBMS Notes

    A.Relational Database Management

    1. Database System Applications

    Databases are widely used. Here are some representative applications:

    Banking: For customer information, accounts, and loans, and banking transactions. Airlines: For reservations and schedule information. Airlines were among the first to use databases in ageographically distributed mannerterminals situated around the world accessed the central databasesystem through phone lines and other data networks. Universities: For student information, course registrations, and grades.

    Credit card transactions: For purchases on credit cards and generation of monthly statements. Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balanceson prepaid calling cards, and storing information about the communication networks. Finance: For storing information about holdings, sales, and purchases of financial instruments such asstocks and bonds.

    Sales: For customer, product, and purchase information. Manufacturing: For management of supply chain and for tracking production of items in factories,

    inventories of items in warehouses/stores, and orders for items. Human resources: For information about employees, salaries, payroll taxes and benefits, and forgeneration of paychecks.

    2. Database Systems versus File Systems

    Consider part of a savings-bank enterprise that keeps information about all customers and savings

    accounts. One way to keep the information on a computer is to store it in operating system files. To allow

    users to manipulate the information, the system has a number of application programs that manipulate the

    files, including

    A program to debit or credit an account

    A program to add a new account

    A program to find the balance of an account

    A program to generate monthly statements

    System programmers wrote these application programs to meet the needs of the bank.

    New application programs are added to the system as the need arises. For example, suppose that the

    savings bank decides to offer checking accounts. As a result, the bank creates new permanent files that

    contain information about all the checking accounts maintained in the bank, and it may have to write new

    application programs to deal with situations that do not arise in savings accounts, such as overdrafts.

    Thus, as time goes by, the system acquires more files and more application programs. This typical file-

    processing system is supported by a conventional operating system. The system stores permanent records

    in various files, and it needs different application programs to extract records from, and add records to,

    the appropriate files. Before database management systems (DBMSs) came along, organizations usuallystored information in such systems. Keeping organizational information in a file-processing system has a

    number of major disadvantages:

    Data redundancy and inconsistency. Since different programmers create the files and application

    programs over a long period, the various files are likely to have different formats and the programs may

    be written in several programming languages. Moreover, the same information may be duplicated in

    several places (files). For example, the address and telephone number of a particular customer may appear

  • 7/25/2019 Database System Applications

    6/43

    D. A. I. T. M.| Ananya Banerjee

    5Relational Database Management

    in a file that consists of savings-account records and in a file that consists of checking-account records.

    This redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency;

    that is, the various copies of the same data may no longer agree. For example, a changed customer

    address may be reflected in savings-account records but not elsewhere in the system.

    Difficulty in accessing data. Suppose that one of the bank officers needs to find out the names of allcustomers who live within a particular postal-code area. The officer asks the data-processing department

    to generate such a list. Because the designers of the original system did not anticipate this request, there is

    no application program on hand to meet it. There is, however, an application program to generate the list

    of all customers. The bank officer has now two choices: either obtains the list of all customers and extract

    the needed information manually or ask a system programmer to write the necessary application program.

    Both alternatives are obviously unsatisfactory. Suppose that such a program is written, and that, several

    days later, the same officer needs to trim that list to include only those customers who have an account

    balance of $10,000 or more. As expected, a program to generate such a list does not exist. Again, the

    officer has the preceding two options, neither of which is satisfactory. The point here is that conventional

    file-processing environments do not allow needed data to be retrieved in a convenient and efficient

    manner. More responsive data-retrieval systems are required for general use.

    Data isolation. Because data are scattered in various files, and files may be in different formats, writing

    new application programs to retrieve the appropriate data is difficult.

    Integrity problems. The data values stored in the database must satisfy certain types of consistency

    constraints. For example, the balance of a bank account may never fall below a prescribed amount (say,

    $25). Developers enforce these constraints in the system by adding appropriate code in the various

    application programs. However, when new constraints are added, it is difficult to change the programs to

    enforce them. The problem is compounded when constraints involve several data items from different

    files.

    Atomicity problems.A computer system, like any other mechanical or electrical device, is subject to

    failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent

    state that existed prior to the failure. Consider a program to transfer $50 from account A to account B.

    If a system failure occurs during the execution of the program, it is possible that the $50 was removed

    from account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it is

    essential to database consistency that either both the credit and debit occur, or that neither occur.

    That is, the funds transfer must be atomicit must happen in its entirety or not at all. It is difficult to

    ensure atomicity in a conventional file-processing system.

    Concurrent-access anomalies.For the sake of overall performance of the system and faster response,

    many systems allow multiple users to update the data simultaneously. In such an environment, interaction

    of concurrent updates may result in inconsistent data. Consider bank account A, containing $500. If two

    customers withdraw funds (say $50 and $100 respectively) from account A at about the same time, the

    result of the concurrent executions may leave the account in an incorrect (or inconsistent) state. Suppose

    that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the

    amount being withdrawn, and write the result back. If the two programs run concurrently, they may both

  • 7/25/2019 Database System Applications

    7/43

    D. A. I. T. M. | Ananya Banerjee

    6 Advanced DBMS Notes

    read the value $500, and write back $450 and $400, respectively. Depending on which one writes the

    value last, the account may contain either $450 or $400, rather than the correct value of $350. To guard

    against this possibility, the system must maintain some form of supervision. But supervision is difficult to

    provide because data may be accessed by many different application programs that have not been

    coordinated previously.

    Security problems. Not every user of the database system should be able to access all the data. For

    example, in a banking system, payroll personnel need to see only that part of the database that has

    information about the various bank employees. They do not need access to information about customer

    accounts. But, since application programs are added to the system in an ad hoc manner, enforcing such

    security constraints is difficult.

    3. View of Data

    A database system is a collection of interrelated files and a set of programs that allow users to access and

    modify these files. A major purpose of a database system is to provide users with an abstract view of the

    data. That is, the system hides certain details of how the data are stored and maintained.

    4. Data Abstraction

    For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers touse complex data structures to represent data in the database. Since many database-systems users are notcomputer trained, developers hide the complexity from users through several levels of abstraction, tosimplify users interactionswith the system:

    Physical level.The lowest level of abstraction describes how the data are actually stored. The physical

    level describes complex low-level data structures in detail.

    Logical level.The next-higher level of abstraction describes what data are stored in the database, and

    what relationships exist among those data. The logical level thus describes the entire database in terms ofa small number of relatively simple structures. Although implementation of the simple structures at thelogical level may involve complex physical-level structures, the user of the logical level does not need tobe aware of this complexity. Database administrators, who must decide what information to keep in thedatabase, use the logical level of abstraction.

    View level.The highest level of abstraction describes only part of the entire database. Even though the

    logical level uses simpler structures, complexity remains because of the variety of information stored in a

    large database. Many users of the database system do not need all this information; instead, they need to

    access only a part of the database. The view level of abstraction exists to simplify their interaction withthe system. The system may provide many views for the same database.

  • 7/25/2019 Database System Applications

    8/43

    D. A. I. T. M.| Ananya Banerjee

    7Relational Database Management

    5. Instances and Schemas

    Databases change over time as information is inserted and deleted. The collection of information stored in

    the database at a particular moment is called an instance of the database. The overall design of the

    database is called the database schema. Schemas are changed infrequently.

    Database systems have several schemas, partitioned according to the levels of abstraction.The physical schema describes the database design at the physical level, while the logical schemadescribes the database design at the logical level. A database may also have several schemas at the viewlevel, sometimes called subschemasthat describe different views of the database.

    6. Database Languages

    A database system provides a data definition language to specify the database schema and a datamanipulation language to express database queries and updates. In practice, the data definition and datamanipulation languages are not two separate languages; instead they simply form parts of a singledatabase language, such as the widely used SQL language.

    7. Data-Definition Language

    We specify a database schema by a set of definitions expressed by a special language called a data-definition language (DDL). For instance, the following statement in the SQL language defines theaccount table:

    create table account(account-number char(10),

    balance integer)Execution of the above DDL statement creates the account table. In addition, it updates a special set oftables called the data dictionary or data directory.A data dictionary contains metadatathat is, data about data. The schema of a table is an example ofmetadata. A database system consults the data dictionary before reading or modifying actual data.

  • 7/25/2019 Database System Applications

    9/43

    D. A. I. T. M. | Ananya Banerjee

    8 Advanced DBMS Notes

    We specify the storage structure and access methods used by the database system by a set of statements ina special type of DDL called a data storage and definition language.These statements define the implementation details of the database schemas, which are usually hiddenfrom the users.

    The data values stored in the database must satisfy certain consistency constraints.For example, suppose the balance on an account should not fall below $100. The DDL provides facilities

    to specify such constraints. The database systems check these constraints every time the database isupdated.

    8. Data-Manipulation Language

    Data manipulation is

    The retrieval of information stored in the database

    The insertion of new information into the database

    The deletion of information from the database

    The modification of information stored in the database

    A data-manipulation language (DML) is a language that enables users to access or manipulate

    data as organized by the appropriate data model. There are basically two types:

    Procedural DMLs require a user to specify what data are needed and how toget those data.

    Declarative DMLs (also referred to as nonprocedural DMLs) require a user tospecify what data are

    needed without specifying how to get those data.

    Declarative DMLs are usually easier to learn and use than are procedural DMLs.

    However, since a user does not have to specify how to get the data, the database system has to figure out

    an efficient means of accessing data. The DML component of the SQL language is nonprocedural.

    select customer.customer-name from customer where customer.customer-id= 192-83-7465

    9. Database Users and Administrators

    A primary goal of a database system is to retrieve information from and store new information in the

    database. People who work with a database can be categorized as database users or database

    administrators.

    Naive usersare unsophisticated users who interact with the system by invoking one of the application

    programs that have been written previously. For example, a bank teller who needs to transfer $50 from

    account A to account B invokes a program called transfer. This program asks the teller for the amount of

    money to be transferred, the account from which the money is to be transferred, and the account to whichthe money is to be transferred.

    Application programmersare computer professionals who write application programs. Application

    programmers can choose from many tools to develop user interfaces. Rapid application development

    (RAD) tools are tools that enable an application programmer to construct forms and reports without

    writing a program.

  • 7/25/2019 Database System Applications

    10/43

    D. A. I. T. M.| Ananya Banerjee

    9Relational Database Management

    Sophisticated usersinteract with the system without writing programs. Instead, they form their requests

    in a database query language. They submit each such query to a query processor, whose function is to

    break down DML statements into instructions that the storage manager understands. Analysts who submit

    queries to explore data in the database fall in this category.

    Specialized users are sophisticated users who write specialized database applications that do not fit intothe traditional data-processing framework. Among these applications are computer-aided design systems,knowledge base and expert systems, systems that store data with complex data types (for example,graphics data and audio data), and environment-modeling systems.

    10. Database Administrator

    One of the main reasons for using DBMSs is to have central control of both the data and the programsthat access those data. A person who has such central control over the system is called a databaseadministrator (DBA). The functions of a DBA include: Schema definition. The DBA creates the original database schema by executing a set of data definitionstatements in the DDL.

    Storage structure and access-method definition. Schema and physical-organization modification. The DBA carries out changes to the schema andphysical organization to reflect the changing needs of the organization, or to alter the physicalorganization to improve performance. Granting of authorization for data access. By granting different types of authorization, the databaseadministrator can regulate which parts of the database various users can access. The authorizationinformation is kept in a special system structure that the database system consults whenever someoneattempts to access the data in the system. Routine maintenance. Examples of the database administrators routinemaintenance activities are: Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of

    data in case of disasters such as flooding. Ensuring that enough free disk space is available for normal operations, and upgrading disk space

    as required. Monitoring jobs running on the database and ensuring that performance is not degraded by very

    expensive tasks submitted by some users.

    11. Keys

    We must have a way to specify how entities within a given entity set are distinguished.Conceptually, individual entities are distinct; from a database perspective, however, the difference amongthem must be expressed in terms of their attributes.

    Therefore, the values of the attribute values of an entity must be such that they can uniquely identify theentity. In other words, no two entities in an entity set are allowed to have exactly the same value for all

    attributes.

    A key allows us to identify a set of attributes that suffice to distinguish entities from each other. Keys alsohelp uniquely identify relationships, and thus distinguish relationships from each other.A superkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely an

    entity in the entity set. For example, the customer-id attribute of the entity set customer is sufficient to

    distinguish one customer entity from another. Thus, customer-id is a superkey. Similarly, the combination

    of customer-name and customer-id is a superkey for the entity set customer. The customer-name attribute

  • 7/25/2019 Database System Applications

    11/43

    D. A. I. T. M. | Ananya Banerjee

    10 Advanced DBMS Notes

    of customer is not a superkey, because several people might have the same name.

    The concept of a superkey is not sufficient for our purposes, since, as we saw, a superkey may contain

    extraneous attributes. If K is a superkey, then so is any superset of K. We are often interested in superkeys

    for which no proper subset is a superkey.

    Such minimal superkeys are called candidate keys.

    It is possible that several distinct sets of attributes could serve as a candidate key.

    Suppose that a combination of customer-name and customer-street is sufficient to distinguish among

    members of the customer entity set. Then, both {customer-id} and {customer-name, customer-street} are

    candidate keys. Although the attributes customerid and customer-name together can distinguish customer

    entities, their combination does not form a candidate key, since the attribute customer-id alone is a

    candidate key. We shall use the term primary key to denote a candidate key that is chosen by the database

    designer as the principal means of identifying entities within an entity set. A key (primary, candidate, and

    super) is a property of the entity set, rather than of the individual entities. Any two individual entities in

    the set are prohibited from having the same value on the key attributes at the same time. The designation

    of a key represents a constraint in the real-world enterprise being modeled. Candidate keys must be

    chosen with care. As we noted, the name of a person is obviously not sufficient, because there may be

    many people with the same name.

    12. Triggers

    A trigger is a statement that the system executes automatically as a side effect of a modification to the

    database. To design a trigger mechanism, we must meet two requirements:

    1. Specify when a trigger is to be executed. This is broken up into an event that causes the trigger to bechecked and a condition that must be satisfied for trigger execution to proceed.

    2. Specify the actions to be taken when the trigger executes. The above model of triggers is referred to as

    the event-condition-action model for triggers.

    The database stores triggers just as if they were regular data, so that they are persistent and are accessible

    to all database operations. Once we enter a trigger into the database, the database system takes on the

    responsibility of executing it whenever the specified event occurs and the corresponding condition is

    satisfied.

    13. Need for Triggers

    Triggers are useful mechanisms for alerting humans or for starting certain tasks automatically whencertain conditions are met. As an illustration, suppose that, instead of allowing negative account balances,the bank deals with overdrafts by setting the account balance to zero, and creating a loan in the amount ofthe overdraft. The bank gives this loan a loan number identical to the account number of the overdrawnaccount.

  • 7/25/2019 Database System Applications

    12/43

    D. A. I. T. M.| Ananya Banerjee

    11Relational Database Management

    For this example, the condition for executing the trigger is an update to the account relation that results ina negative balance value. Suppose that Jones withdrawalof some money from an account made theaccount balance negative. Let t denote the account tuple with a negative balance value. The actions to betaken are:

    Insert a new tuples in the loan relation with

    (Note that, since t[balance] is negative, we negate t[balance] to get the loan amounta positive number.)

    Insert a new tuple u in the borrower relation with

    Set t[balance] to 0.

    Note that trigger systems cannot usually perform updates outside the database, and hence in the inventoryreplenishment example, we cannot use a trigger to directly place an order in the external world. Instead,we add an order to the orders relation as in the inventory example. We must create a separatepermanently running system process that periodically scans the orders relation and places orders. Thissystem process would also note which tuples in the orders relation have been processed and when eachorder was placed. The process would also track deliveries of orders, and alert managers in case ofexceptional conditions such as delays in deliveries.

    14. Triggers in SQL

  • 7/25/2019 Database System Applications

    13/43

    D. A. I. T. M. | Ananya Banerjee

    12 Advanced DBMS Notes

    B.Entity-Relationship Diagram

    An E-R diagram can express the overall logical structure of a database graphically. E-R diagrams aresimple and clearqualities that may well account in large part for the widespread use of the E-R model.Such a diagram consists of the following major components:

    Rectangles, which represent entity sets Ellipses, which represent attributes Diamonds, which represent relationship sets Lines, which link attributes to entity sets and entity sets to relationship sets Double ellipses, which represent multivalued attributes Dashed ellipses, which denote derived attributes Double lines, which indicate total participation of an entity in a relationship set Double rectangles, which represent weak entity sets

    1. Extended E-R Features

    Although the basic E-R concepts can model most database features, some aspects of a database may bemore aptly expressed by certain extensions to the basic E-R model.In this section, we discuss the extended E-R features of specialization, generalization, higher- and lower-level entity sets, attribute inheritance, and aggregation.

  • 7/25/2019 Database System Applications

    14/43

    D. A. I. T. M.| Ananya Banerjee

    13Entity-Relationship Diagram

    2. Specialization

    An entity set may include subgroupings of entities that are distinct in some way from other entities in theset. For instance, a subset of entities within an entity set may have attributes that are not shared by all the

    entities in the entity set. The E-R model provides a means for representing these distinctive entitygroupings.

    Consider an entity set person, with attributes name, street, and city. A person may be further classified asone of the following:

    Customer Employee

    Each of these person types is described by a set of attributes that includes all the attributes of entity setperson plus possibly additional attributes. For example, customer entities may be described further by theattribute customer-id, whereas employee entities may be described further by the attributes employee-idand salary. The process of designating subgroupings within an entity set is called specialization. The

    specialization of person allows us to distinguish among persons according to whether they are employeesor customers.

    3. Generalization

    The refinement from an initial entity set into successive levels of entity subgroupings represents a top-down design process in which distinctions are made explicit. The design process may also proceed in abottom-up manner, in which multiple entity sets are synthesized into a higher-level entity set on the basisof common features. The database designer may have first identified a customer entity set with theattributes name,street, city, and customer-id, and an employee entity set with the attributes name,

    street, city, employee-id, andsalary.

  • 7/25/2019 Database System Applications

    15/43

    D. A. I. T. M. | Ananya Banerjee

    14 Advanced DBMS Notes

  • 7/25/2019 Database System Applications

    16/43

    D. A. I. T. M.| Ananya Banerjee

    15The Relational Algebra

    C.The Relational Algebra

    The relational algebra is aprocedural query language. It consists of a set of operations that take one ortwo relations as input and produce a new relation as their result. The fundamental operations in therelational algebra areselect, project, union,set difference, Cartesian product and rename. In addition to

    the fundamental operations, there are several other operationsnamely, set intersection, natural join,division, and assignment. We will define these operations in terms of the fundamental operations.

    1. The Select Operation

    The select operation selects tuples that satisfy a given predicate. We use the lowercase Greek letter sigma() to denote selection. The predicate appears as a subscript to .The argument relation is in parentheses after the . Thus, to select those tuples of the loan relation wherethe branch is Perryridge, we write

    2. The Project Operation

    Suppose we want to list all loan numbers and the amount of the loans, but do not care about the branchname. The project operation allows us to produce this relation.The project operation is a unary operation that returns its argument relation, with certain attributes leftout. Since a relation is a set, any duplicate rows are eliminated.Projection is denoted by the uppercase Greek letter pi ().We list those attributes that we wish to appearin the result as a subscript to . The argument relation follows inparentheses. Thus, we write the query tolist all loan numbers and the amount of the loan as

    3. Composition of Relational Operations

    The fact that the result of a relational operation is itself a relation is important. Consider the more

    complicated query Find those customers who live in Harrison. We write:

    4. The Union Operation

    Consider a query to find the names of all bank customers who have either an account or a loan or both.Note that the customer relation does not contain the information, since a customer does not need to haveeither an account or a loan at the bank. To answer this query, we need the information in the depositorrelation (Figure 3.5) and in the borrower relation (Figure 3.7).We know how to find the names of allcustomers with a loan in the bank:

    We also know how to find the names of all customers with an account in the bank:

  • 7/25/2019 Database System Applications

    17/43

    D. A. I. T. M. | Ananya Banerjee

    16 Advanced DBMS Notes

    To answer the query, we need the union of these two sets; that is, we need all customer names that appearin either or both of the two relations. We find these data by the binary operation union, denoted, as in set

    theory, by . So the expression needed is

    5. The Set Difference Operation

    The set-difference operation, denoted by , allows us to find tuples that are in one relation but are not in

    another. The expression r s produces a relation containing those tuples in r but not in s. We can find all

    customers of the bank who have an account but not a loan by writing

    6. The Cartesian-Product Operation

    The Cartesian-product operation, denoted by a cross (), allows us to combine information from anytwo relations. We write the Cartesian product of relations r1 and r2 as r1 r2.

    7. The Rename Operation

    Unlike relations in the database, the results of relational-algebra expressions do not have a name that wecan use to refer to them. It is useful to be able to give them names; the rename operator, denoted by thelowercase Greek letter rho (), lets us do this. Given a relational-algebra expressionE, the expression

    8. The Natural-Join Operation

    It is often desirable to simplify certain queries that require a Cartesian product. Usually, a query thatinvolves a Cartesian product includes a selection operation on the result of the Cartesian product.Consider the query Find the names of all customers who have a loan at the bank, along with the loannumber and the loan amount.We first form the Cartesian product of the borrower and loan relations.Then, we select those tuples that pertain to only the same loan-number, followed by the projection of theresulting customer-name, loan-number, and amount:

    The natural join is a binary operation that allows us to combine certain selections and a Cartesian product

    into one operation. It is denoted by the join symbol. Thenatural-join operation forms a Cartesianproduct of its two arguments, performs a selection forcing equality on those attributes that appear in bothrelation schemas, and finally removes duplicate attributes

    9. The Division Operation

    The division operation, denoted by , is suited to queries that include the phrase

  • 7/25/2019 Database System Applications

    18/43

    D. A. I. T. M.| Ananya Banerjee

    17The Relational Algebra

    for all Suppose that we wish to find all customers who have an account at all the branches located inBrooklyn. We can obtain all branches in Brooklyn by the expression

    10. Outer Join

    The outer-join operation is an extension of the join operation to deal with missing information. Supposethat we have the relations with the following schemas, which contain data on full-time employees:

    employee (employee-name, street, city)ft-works (employee-name, branch-name, salary)

    Consider the employee and ft-works relations. Suppose that we want to generate a single relation with allthe information (street, city, branch name, and salary) about full-time employees. A possible approachwould be to use the natural join operation as follows:

    The result of this expression appears in Figure 3.32. Notice that we have lost the street and cityinformation about Smith, since the tuple describing Smith is absent from the ft-works relation; similarly,we have lost the branch name and salary information about Gates, since the tuple describing Gates isabsent from the employee relation.We can use the outer-join operation to avoid this loss of information. There are actually three forms of the

    operation: left outer join, denoted ; right outer join, denoted and full outer join, denoted . Allthree forms of outer join compute the join, and add extra tuples to the result of the join.

    11. The Tuple Relational Calculus

    When we write a relational-algebra expression, we provide a sequence of procedures that generates theanswer to our query. The tuple relational calculus, by contrast, is a nonprocedural query language. Itdescribes the desired information without giving a specific procedure for obtaining that information.A query in the tuple relational calculus is expressed as

    That is, it is the set of all tuples t such that predicateP is true for t. Following our earlier notation, we use

    t[A] to denote the value of tuple t on attributeA, and we use t r to denote that tuple t is in relation r.

    12. Example Queries

    Say that we want to find the branch-name, loan-number, and amount for loans of over $1200:

    Suppose that we want only the loan-number attribute, rather than all attributes of the loan relation. To

    write this query in the tuple relational calculus, we need to write an expression for a relation on theschema (loan-number). We need those tuples on (loan-number) such that there is a tuple in loan with theamount attribute > 1200. To express this request, we need the construct there existsfrom mathematicallogic. The notation

    Means there exists a tuple t in relation r such that predicate Q(t) is true.Using this notation, we can write the query Find the loan number for each loan of an amount greater than$1200as

  • 7/25/2019 Database System Applications

    19/43

    D. A. I. T. M. | Ananya Banerjee

    18 Advanced DBMS Notes

    13. Referential Integrity

    Often, we wish to ensure that a value that appears in one relation for a given set of attributes also appearsfor a certain set of attributes in another relation. This condition is called referential integrity.

    Foreign keys can be specified as part of the SQL creates table statement by using the foreign key clause.We illustrate foreign-key declarations by using the SQL DDL definition of part of our bank database

    By default, a foreign key reference the primary key attributes of the referenced table. SQL also supports aversion of the references clause where a list of attributes of the referenced relation can be specifiedexplicitly. The specified list of attributes must be declared as a candidate key of the referenced relation.

    We can use the following short form as part of an attribute definition to declare that the attribute forms a

    foreign key:

    branch-name char(15) references branch

    When a referential-integrity constraint is violated, the normal procedure is to reject the action that causedthe violation. However, a foreign key clause can specify that if a delete or update action on the referencedrelation violates the constraint, then, instead of rejecting the action, the system must take steps to changethe tuple in the referencing relation to restore the constraint. Consider this definition of an integrityconstraint on the relation account:

    create table account( . . .

    foreign key (branch-name) references branchon delete cascadeon update cascade,. . . )

  • 7/25/2019 Database System Applications

    20/43

    D. A. I. T. M.| Ananya Banerjee

    19Normalization

    D.Normalization

    1. Functional dependency

    A functional dependency, denoted by X ~ Y,between two sets of attributes X and Y that are subsets of R

    specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, forany two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[y].

    2. First Normal Form

    The only attribute values permitted by lNF are single atomic (or indivisible) values.

    3. Second Normal Form

    Second normal form (2NF) is based on the concept offull functional dependency. A functionaldependency X -7 Y is a full functional dependency if removal of any attribute A from X means that thedependency does not hold any more; that is, for any attribute A E X, (X - {A}) does not functionallydetermine Y.A functional dependency X -7 Y is a partial dependency if some attribute A E X can be

  • 7/25/2019 Database System Applications

    21/43

    D. A. I. T. M. | Ananya Banerjee

    20 Advanced DBMS Notes

    removed from X and the dependency still holds; that is, for some A E X, (X - {A}) -7 Y. In Figure lO.3b,{SSN, PNUMBER} -7 HOURS is a full dependency (neither SSN -7 HOURS nor PNUMBER -7HOURS holds). However, the dependency {SSN, PNUMBER} -7 ENAME is partial because SSN -7ENAME holds.Definition. A relation schema R is in 2NF if every nonprime attribute A in R isfully functionallydependent on the primary key of R.

    4. Third Normal Form

    Third normal form (3NF) is based on the concept of transitive dependency. A functional dependency X ~Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is neither a

    candidate key nor a subset of any key of R, and both X -> Z and Z -> Y hold. The dependency SSN ->DMGRSSN is transitive through DNUMBER in EMP_DEPT because both the dependencies SSN ->DNUMBER and DNUMBER -> DMGRSSN hold and DNUMBER is neither a key itself nor a subset ofthe key of EMP_DEPT. Intuitively, we can see that the dependency of DMGRSSN on DNUMBER isundesirable in EMP_DEPT since DNUMBER is not a key of EMP_DEPT.

    Definition. According to Codd's original definition, a relation schema R is in 3NF if it satisfies 2NFandnononprime attribute of R is transitively dependent on the primary key.

  • 7/25/2019 Database System Applications

    22/43

    D. A. I. T. M.| Ananya Banerjee

    21Normalization

    5. BOYCE-CODD Normal Form

    Definition. A relation schema R is in BCNF if whenever a nontrivial functional dependency X --7 Aholds in R, then X is a superkey of R.

    In practice, most relation schemas that are in 3NF are also in BCNF. Only if X -1 A holds in a relationschema R with X not being a superkey and A being a prime attribute will R be in 3NF but not in BCNF.The relation schema R shown in Figure lO.l2b illustrates the general case of such a relation. Ideally,relational database design should strive to achieve BCNF or 3NF for every relation schema. Achievingthe normalization status of just 1NF or 2NF is not considered adequate, since they were developedhistorically as stepping stones to 3NF and BCNF. As another example, consider Figure 10.13, whichshows a relation TEACH with the following dependencies:

    FDl: {STUDENT, COURSE} ~ INSTRUCTORFD2: 16 INSTRUCTOR ~ COURSE

    Note that {STUOENT, COURSE} is a candidate key for this relation and that the dependencies shownfollow the pattern in Figure 10.12b, with STUDENT as A, COURSE as B, and INSTRUCTOR as C.Hence this relation is in 3NF but not BCNF. Decomposition of this relation schema into two schemas isnot straightforward because it may be decomposed into one of the three following possible pairs:

    1. {STUDENT, INSTRUCTOR} and {STUDENT, COURSE}.2. {COURSE. INSTRUCTOR} and {COURSE, STUDENT}.3. {INSTRUCTOR. COURSE} and {INSTRUCTOR, STUDENT}.

  • 7/25/2019 Database System Applications

    23/43

    D. A. I. T. M. | Ananya Banerjee

    22 Advanced DBMS Notes

    6. Lossless-Join Decomposition

    When we decompose a relation into a number of smaller relations, it is crucial that the decomposition belossless. We claim that the decomposition is indeed lossless. To demonstrate our claim, we must firstpresent a criterion for determining whether a decomposition is lossy.

    Let R be a relation schema, and let F be a set of functional dependencies on R. Let R1 and R2 form adecomposition of R. This decomposition is a lossless-join decomposition of R if at least one of thefollowing functional dependencies is in F+:

    In other words, ifR1 R2 forms a superkey of eitherR1 orR2, the decomposition ofR is a lossless-join decomposition. We can use attribute closure to efficiently test for superkeys, as wehave seen earlier.

    We now demonstrate that our decomposition ofLending-schema is a lossless-join decomposition byshowing a sequence of steps that generate the decomposition. We begin by decomposingLending-schemainto two schemas:

    Since branch-name branch-city assets, the augmentation rule for functional dependencies(Section 7.3.2) implies that

    SinceBranch-schema Loan-info-schema = {branch-name}, it follows that our initial decomposition is alossless-join decomposition. Next, we decomposeLoan-info-schema into

  • 7/25/2019 Database System Applications

    24/43

    D. A. I. T. M.| Ananya Banerjee

    23Normalization

    This step results in lossless-join decomposition, since loan-number is a common attribute and loan-number amount branch-name.

    7. Multivalued Dependencies and Fourth Normal Form

    8. Join Dependencies and Fifth Normal Form

  • 7/25/2019 Database System Applications

    25/43

    D. A. I. T. M. | Ananya Banerjee

    24 Advanced DBMS Notes

    E.Transaction Concept

    1. ACID Property

    Atomicity. Either all operations of the transaction are reflected properly in the database, or none are.

    Consistency. Execution of a transaction in isolation (that is, with no other transaction executingconcurrently) preserves the consistency of the database. Isolation. Even though multiple transactions may execute concurrently, the system guarantees that, forevery pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or

    Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executingconcurrently in the system. Durability. After a transaction completes successfully, the changes it has made to the database persist,even if there are system failures.

    2. Transaction State

    Active, the initial state; the transaction stays in this state while it is executing

    Partially committed, after the final statement has been executed Failed, after the discovery that normal execution can no longer proceed Aborted, after the transaction has been rolled back and the database has been restored to its state priorto the start of the transaction Committed, after successful completion

    3. Serializability

    The database system must control concurrent execution of transactions, to ensure that the database stateremains consistent. Before we examine how the database system can carry out this task, we must firstunderstand which schedules will ensure consistency, and which schedules will not.

    Since transactions are programs, it is computationally difficult to determine exactly what operations atransaction performs and how operations of various transactions interact. For this reason, we shall notinterpret the type of operations that a transaction can perform on a data item. Instead, we consider onlytwo operations: read and write. We thus assume that, between a read(Q) instruction and a write(Q)

  • 7/25/2019 Database System Applications

    26/43

    D. A. I. T. M.| Ananya Banerjee

    25Transaction Concept

    instruction on a data item Q, a transaction may perform an arbitrary sequence of operations on the copy ofQ that is residing in the local buffer of the transaction. Thus, the only significant operations of atransaction, from a scheduling point of view, are its read and write instructions. We shall therefore usuallyshow only read and write instructions in schedules, as we do in schedule 3 in Figure 15.7.In this section, we discuss different forms of schedule equivalence; they lead to the notions of conflictserializability and view serializability.

    4. Conflict Serializability

    1.Ii = read(Q),Ij = read(Q). The order ofIi andIj does not matter, since the same value of Q is read by Tiand Tj , regardless of the order.2.Ii = read(Q),Ij = write(Q). IfIi comes beforeIj, then Ti does not read the value of Q that is written byTj in instructionIj. IfIj comes beforeIi, then Ti reads the value of Q that is written by Tj. Thus, the order

    ofIi andIj matters.3.Ii = write(Q),Ij = read(Q). The order ofIi andIj matters for reasons similar to those of the previouscase.4.Ii = write(Q),Ij = write(Q). Since both instructions are write operations, the order of these instructionsdoes not affect either Ti or Tj . However, the value obtained by the next read(Q) instruction of S isaffected, since the result of only the latter of the two write instructions is preserved in the database. Ifthere is no other write(Q) instruction afterIi andIj in S, then the order ofIi andIj directly affects the finalvalue of Q in the database state that results from schedule S.

    Thus, only in the case where both Ii and Ij are read instructions does the relative order of their executionnot matter.

    We say that Ii and Ij conflict if they are operations by different transactions on the same data item, and atleast one of these instructions is a write operation.

    To illustrate the concept of conflicting instructions, we consider schedule 3. The write(A) instruction ofT1 conflicts with the read(A) instruction of T2.

    However, the write(A) instruction of T2 does not conflict with the read(B) instruction of T1, because thetwo instructions access different data items.

  • 7/25/2019 Database System Applications

    27/43

    D. A. I. T. M. | Ananya Banerjee

    26 Advanced DBMS Notes

    Let Ii and Ij be consecutive instructions of a schedule S. If Ii and Ij are instructions of differenttransactions and Ii and Ij do not conflict, then we can swap the order of Ii and Ij to produce a newschedule S. We expect S to be equivalent to S, since all instructions appear in the same order in bothschedules except for Ii and Ij, whose order does not matter.

    Since the write(A) instruction of T2 in schedule 3 of Figure 15.7 does not conflict with the read(B)

    instruction of T1, we can swap these instructions to generate an equivalent schedule, schedule 5.Regardless of the initial system state, schedules 3 and 5 both produce the same final system state.We continue to swap no conflicting instructions:

    Swap the read(B) instruction of T1 with the read(A) instruction of T2. Swap the write(B) instruction of T1 with the write(A) instruction of T2.

    Swap the write(B) instruction of T1 with the read(A) instruction of T2.

    The final result of these swaps, schedule 6, is a serial schedule. Thus, we have shown that schedule 3 is

    equivalent to a serial schedule. This equivalence implies that, regardless of the initial system state,schedule 3 will produce the same final state as will some serial schedule.

    If a schedule S can be transformed into a schedule S_by a series of swaps of non conflicting instructions,we say that S and S_ are conflict equivalent.

    In our previous examples, schedule 1 is not conflict equivalent to schedule 2.However, schedule 1 isconflict equivalent to schedule 3, because the read(B) and write(B) instruction of T1 can be swapped withthe read(A) and write(A) instruction of T2.

    The concept of conflict equivalence leads to the concept of conflict serializability. We say that a schedule

    S is conflict serializable if it is conflict equivalent to a serial schedule. Thus, schedule 3 is conflict

    serializable, since it is conflict equivalent to the serial schedule 1.

    5. View Serializability

    In this section, we consider a form of equivalence that is less stringent than conflict equivalence, but that,like conflict equivalence, is based on only the read and write operations of transactions.

  • 7/25/2019 Database System Applications

    28/43

    D. A. I. T. M.| Ananya Banerjee

    27Transaction Concept

    Consider two schedules S and S, where the same set of transactions participates in both schedules. Theschedules S and Sare said to be view equivalent if three conditions are met:

    1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Timust, in schedule S, also read the initial value of Q.2. For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that value was produced

    by a write(Q) operation executed by transaction Tj , then the read(Q) operation of transaction Ti must, inschedule S, also read thevalue of Q that was produced by the same write(Q) operation of transaction Tj .3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule Smust perform the final write(Q) operation in schedule S.

    6. Testing for Serializability

    When designing concurrency control schemes, we must show that schedules generated by the scheme areserializable. To do that, we must first understand how to determine, given a particular schedule S, whetherthe schedule is serializable.We now present a simple and efficient method for determining conflict serializability of a schedule.

    Consider a schedule S. We construct a directed graph, called a precedence graph, from S. This graphconsists of a pair G = (V, E), where V is a set of vertices andE is a set of edges. The set of verticesconsists of all the transactions participating in the schedule. The set of edges consists of all edges Ti Tjfor which one of three conditions holds:1. Ti executes write(Q) before Tj executes read(Q).2. Ti executes read(Q) before Tj executes write(Q).3. Ti executes write(Q) before Tj executes write(Q).

    7. Lock-Based Protocols

    One way to ensure serializability is to require that data items be accessed in a mutually exclusive manner;that is, while one transaction is accessing a data item, no other transaction can modify that data item. Themost common method used to implement this requirement is to allow a transaction to access a data itemonly if it is currently holding a lock on that item.

    8. Locks

    There are various modes in which a data item may be locked. In this section, we restrict our attention totwo modes:1. Shared. If a transaction Ti has obtained a shared-mode lock (denoted by S) on item Q, then Ti canread, but cannot write, Q.

  • 7/25/2019 Database System Applications

    29/43

    D. A. I. T. M. | Ananya Banerjee

    28 Advanced DBMS Notes

    2. Exclusive. If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on item Q, then Tican both read and write Q.

    9. The Two-Phase Locking Protocol

    One protocol that ensures serializability is the two-phase locking protocol. This protocol requires thateach transaction issue lock and unlock requests in two phases:

    1. Growing phase. A transaction may obtain locks, but may not release any lock.2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.

    Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once thetransaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests.For example, transactions T3 and T4 are two phase. On the other hand, transactionsT1 and T2 are not two phase. Note that the unlock instructions do not need to appear at the end of thetransaction. For example, in the case of transaction T3, we could move the unlock(B) instruction to justafter the lock-X(A) instruction, and still retain the two-phase locking property.

    We can show that the two-phase locking protocol ensures conflict serializability.

    Consider any transaction. The point in the schedule where the transaction has obtained its final lock (theend of its growing phase) is called the lock point of the transaction. Now, transactions can be orderedaccording to their lock pointsthis ordering is, in fact, a serializability ordering for the transactions.

    Two-phase locking does not ensure freedom from deadlock. Observe that transactions

    T3 and T4 are two phase, but they are deadlocked.

    In addition to being serializable, schedules should be cascade less. Cascading rollback may occur undertwo-phase locking. As an illustration, consider the partial schedule. Each transaction observes the two-phase locking protocol, but the failure of T5 after the read(A) step of T7 leads to cascading rollback of T6and T7.

    Cascading rollbacks can be avoided by a modification of two-phase locking called the strict two-phaselocking protocol. This protocol requires not only that locking be two phase, but also that all exclusive-mode locks taken by a transaction be held until that transaction commits. This requirement ensures thatany data written by an uncommitted transaction are locked in exclusive mode until the transactioncommits, preventing any other transaction from reading the data.

    10. Deadlock Handling

    A system is in a deadlock state if there exists a set of transactions such that every transaction in the set iswaiting for another transaction in the set. More precisely, there exists a set of waiting transactions {T0,T1, . . ., Tn} such that T0 is waiting for a data item that T1 holds, and T1 is waiting for a data item that T2holds, and . . ., and Tn1 is waiting for a data item that Tn holds, and Tn is waiting for a data item thatT0 holds. None of the transactions can make progress in such a situation.

    The only remedy to this undesirable situation is for the system to invoke some drastic action, such asrolling back some of the transactions involved in the deadlock.

  • 7/25/2019 Database System Applications

    30/43

    D. A. I. T. M.| Ananya Banerjee

    29Transaction Concept

    Rollback of a transaction may be partial: That is, a transaction may be rolled back to the point where itobtained a lock whose release resolves the deadlock.

    There are two principal methods for dealing with the deadlock problem. We can use a deadlockprevention protocol to ensure that the system will never enter a deadlock state. Alternatively, we canallow the system to enter a deadlock state, and then try to recover by using a deadlock detection and

    deadlock recovery scheme. As we shall see, both methods may result in transaction rollback. Preventionis commonly used if the probability that the system would enter a deadlock state is relatively high;otherwise, detection and recovery are more efficient. Note that a detection and recovery scheme requiresoverhead that includes not only the run-time cost of maintaining the necessary information and ofexecuting the detection algorithm, but also the potential losses inherent in recovery from a deadlock.

    11. Deadlock Prevention

    1. The waitdie schemeis a non preemptive technique. When transaction Ti requests a data itemcurrently held by Tj , Ti is allowed to wait only if it has a timestamp smaller than that of Tj (that is, Ti isolder than Tj ). Otherwise, Ti is rolled back (dies).

    For example, suppose that transactions T22, T23, and T24 have timestamps 5, 10, and 15, respectively. IfT22 requests a data item held by T23, then T22 will wait. If T24 requests a data item held by T23, thenT24 will be rolled back.

    2. The woundwait schemeis a preemptive technique. It is a counterpart to the waitdie scheme. Whentransaction Ti requests a data item currently held by Tj , Ti is allowed to wait only if it has a timestamplarger than that of Tj (that is, Ti is younger than Tj ). Otherwise, Tj is rolled back (Tj is wounded by Ti).Returning to our example, with transactions T22, T23, and T24, if T22 requests a data item held by T23,then the data item will be preempted from T23, and T23 will be rolled back. If T24 requests a data itemheld by T23, then T24 will wait.

    Whenever the system rolls back transactions, it is important to ensure that there is no starvationthat is,no transaction gets rolled back repeatedly and is never allowed to make progress.

    12. Deadlock Detection and Recovery

    If a system does not employ some protocol that ensures deadlock freedom, then a detection and recoveryscheme must be used. An algorithm that examines the state of the system is invoked periodically todetermine whether a deadlock has occurred. If one has, then the system must attempt to recover from thedeadlock. To do so, the system must:

    Maintain information about the current allocation of data items to transactions, as well as anyoutstanding data item requests. Provide an algorithm that uses this information to determine whether the system has entered a deadlock

    state. Recover from the deadlock when the detection algorithm determines that a deadlock exists.

    Recovery from Deadlock

    When a detection algorithm determines that a deadlock exists, the system must recover from thedeadlock. The most common solution is to roll back one or more transactions to break the deadlock.Three actions need to be taken:

  • 7/25/2019 Database System Applications

    31/43

    D. A. I. T. M. | Ananya Banerjee

    30 Advanced DBMS Notes

    1. Selection of a victim.Given a set of deadlocked transactions, we must determine which transaction (ortransactions) to roll back to break the deadlock. We should roll back those transactions that will incur theminimum cost. Unfortunately, the term minimum cost is not a precise one. Many factors may determinethe cost of a rollback, including

    a. How long the transaction has computed, and how much longer the transaction will computebefore it completes its designated task.

    b. How many data items the transaction has used.c. How many more data items the transaction needs for it to complete.d. How many transactions will be involved in the rollback.

    2. Rollback. Once we have decided that a particular transaction must be rolled back, we must determinehow far this transaction should be rolled back.

    The simplest solution is a total rollback: Abort the transaction and then restart it. However, it is moreeffective to roll back the transaction only as far as necessary to break the deadlock. Such partial rollbackrequires the system to maintain additional information about the state of all the running transactions.

    Specifically, the sequence of lock requests/grants and updates performed by the transaction needs to be

    recorded. The deadlock detection mechanism should decide which locks the selected transaction needs torelease in order to break the deadlock. The selected transaction must be rolled back to the point where itobtained the first of these locks, undoing all actions it took after that point. The recovery mechanism mustbe capable of performing such partial rollbacks. Furthermore, the transactions must be capable ofresuming execution after a partial rollback. See the bibliographical notes for relevant references.

    3. Starvation. In a system where the selection of victims is based primarily on cost factors, it may happenthat the same transaction is always picked as a victim. As a result, this transaction never completes itsdesignated task, thus there is starvation. We must ensure that transaction can be picked as a victim only a(small) finite number of times. The most common solution is to include the number of rollbacks in thecost factor.

  • 7/25/2019 Database System Applications

    32/43

    D. A. I. T. M.| Ananya Banerjee

    31Database Tuning

    F.Database Tuning

    After a database is deployed and is in operation, actual use of the applications, transactions, queries, andviews reveals factors and problem areas that may not have been accounted for during the initial physicaldesign.

    Resource utilization as well as internal DBMS processing-such as query optimization-can be monitored toreveal bottlenecks, such as contention for the same data or devices. Volumes of activity and sizes of datacan be better estimated. It is therefore necessary to monitor and revise the physical database designconstantly. The goals of tuning are as follows:

    To make applications run faster.

    To lower the response time of queries/transactions. To improve the overall throughput of transactions.

    The dividing line between physical design and tuning is very thin. The same design decisions that wediscussed in Section 16.1.3 are revisited during the tuning phase, which is a continued adjustment of

    design. We give only a brief overview of the tuning process below.' The inputs to the tuning processinclude statistics related to the factors. In particular, DBMSs can internally collect the following statistics:

    Sizes of individual tables.

    Number of distinct values in a column. The number of times a particular query or transaction is submitted/executed in aninterval of time. The times required for different phases of query and transaction processing (for a given set of queries ortransactions).

    These and other statistics create a profile of the contents and use of the database.Other information obtained from monitoring the database system activities and processes includes thefollowing:Storage statistics: Data about allocation of storage into tablespaces, indexspaces, and buffer ports. I/O and device performance statistics:Total read/write activity (paging) on disk extents and disk hotspots. Query/transaction processing statistics:Execution times of queries and transactions, optimization timesduring query optimization.Locking/logging related statistics:Rates of issuing different types of locks, transaction throughputrates, and log records activity."Index statistics:Number of levels in an index, number of noncontiguous leaf pages, etc.

    Tuning a database involves dealing with the following types of problems:

    How to avoid excessive lock contention, thereby increasing concurrency among transactions.

    How to minimize overheard of logging and unnecessary dumping of data. How to optimize buffer size and scheduling of processes. How to allocate resources such as disks, RAM, and processes for most efficient utilization.

    Most of the previously mentioned problems can be solved by setting appropriate physical DBMSparameters, changing configurations of devices, changing operating system parameters, and other similaractivities. The solutions tend to be closely tied to specific systems. The DBAs are typically trained tohandle these problems of tuning for the specific DBMS. We briefly discuss the tuning of various physicaldatabase design decisions below.

  • 7/25/2019 Database System Applications

    33/43

    D. A. I. T. M. | Ananya Banerjee

    32 Advanced DBMS Notes

    G.Database Security and Authorization

    1. Types of Security

    Legal and ethical issues regarding the right to access certain information. Some information may be

    deemed to be private and cannot be accessed legally by unauthorized persons.

    In the United States, there are numerous laws governing privacy of information.

    Policy issues at the governmental, institutional, or corporate level as to what kinds ofinformation shouldnot be made publicly available-for example, credit ratings and personal medical records. System-related issues such as the system levels at which various security functions should be enforced-for example, whether a security function should be handled at the physical hardware level, the operatingsystem level, or the DBMS level. The need in some organizations to identify multiple security levels and to categorize the data and usersbased on these classifications-for example, top secret, secret, confidential, and unclassified. The securitypolicy of the organization with respect to permitting access to various classifications of data must be

    enforced.

    2. Threats to Databases

    Threats to databases result in the loss or degradation of some or all of the following security goals:integrity, availability, and confidentiality.

    Loss of integrity: Database integrity refers to the requirement that information be protected fromimproper modification. Modification of data includes creation, insertion, modification, changing thestatus of data, and deletion. Integrity is lost if unauthorized changes are made to the data by eitherintentional or accidental acts. If the loss of system or data integrity is not corrected, continued use of thecontaminated system or corrupted data could result in inaccuracy, fraud, or erroneous decisions.

    Lossof availability: Database availability refers to making objects available to a human user or aprogram to which they have a legitimate right. Loss of confidentiality: Database confidentiality refers to the protection of data from unauthorizeddisclosure. The impact of unauthorized disclosure of confidential information can range from violation ofthe Data Privacy Act to the jeopardization of national security. Unauthorized, unanticipated, orunintentional disclosure could result in loss of public confidence, embarrassment, or legal action againstthe organization.

    To protect databases against these types of threats four kinds of countermeasures can be implemented:access control, inference control, flow control, and encryption.

    In a multiuser database system, the DBMS must provide techniques to enable certain users or user groups

    to access selected portions of a database without gaining access to the rest of the database. This isparticularly important when a large integrated database is to be used by many different users within thesame organization. For example, sensitive information such as employee salaries or performance reviewsshould be kept confidential from most of the database system's users. A DBMS typically includes adatabase security and authorization subsystem that is responsible for ensuring the security of portions of adatabase against unauthorized access. It is now customary to refer to two types of database securitymechanisms:

  • 7/25/2019 Database System Applications

    34/43

    D. A. I. T. M.| Ananya Banerjee

    33Database Security and Authorization

    Discretionary security mechanisms: These are used to grant privileges to users, including thecapability to access specific data files, records, or fields in a specified mode (such as read, insert, delete,or update).Mandatory security mechanisms: These are used to enforce multilevel security by classifying the dataand users into various security classes (or levels) and then implementing the appropriate security policy ofthe organization. For example, a typical security policy is to permit users at a certain classification level

    to see only the data items classified at the user's own (or lower) classification level. An extension of thisis role-based security, which enforces policies and privileges based on the concept of roles.

    3. Control Measures

    There are four main control measures that are used to provide security of data in databases. There are asfollows:

    Access Control

    Inference Control

    Flow Control Data Encryption

    4. Database Security and the DBA

    1. Account creation:This action creates a new account and password for a user or a group of users toenable access to the DBMS.2. Privilege granting:This action permits the DBA to grant certain privileges to certain accounts.3. Privilege revocation: This action permits the DBA to revoke (cancel) certain privileges that werepreviously given to certain accounts.4. Security level assignment:This action consists of assigning user accounts to the appropriate securityclassification level.

  • 7/25/2019 Database System Applications

    35/43

    D. A. I. T. M. | Ananya Banerjee

    34 Advanced DBMS Notes

    H.Multimedia Databases

    1. The Nature of Multimedia Data and Applications

    Today the following types of multimedia data are available in current systems:

    Text: May be formatted or unformatted. For ease of parsing structured documents, standards like SOMLand variations such as HTML are being used. Graphics: Examples include drawings and illustrations that are encoded using some descriptivestandards (e.g., COM, PICT, postscript}.Images: Includes drawings, photographs, and so forth, encoded in standard formats such as bitmap,JPEO, and MPEO. Compression is built into JPEO and MPEO. These images are not subdivided intocomponents. Hence querying them by content (e.g., find all images containing circles) is nontrivial.Animations: Temporal sequences of image or graphic data. Video: A set of temporally sequenced photographic data for presentation at specified rates-for example,30 frames per second.Structured audio: A sequence of audio components comprising note, tone, duration, and so forth.Audio: Sample data generated from aural recordings in a string of bits in digitized form. Analog

    recordings are typically converted into digital form before storage. Composite or mixed multimedia data: A combination of multimedia data types such as audio and videowhich may be physically mixed to yield a new storage format or logically mixed while retaining originaltypes and formats. Composite data also contains additional control information describing how theinformation should be rendered.

    Nature of Multimedia ApplicationsMultimedia data may be stored, delivered, and utilized in many different ways. Applications may becategorized based on their data management characteristics as follows:

    Repository applications: A large amount of multimedia data as well as metadata is stored for retrievalpurposes. A central repository containing multimedia data may be maintained by a DBMSand may be

    organized into a hierarchy of storage levels-local disks, tertiary disks and tapes, optical disks, and so on.Examples include repositories of satellite images, engineering drawings and designs, space photographs,and radiology scanned pictures.Presentation applications:A large number of applications involve delivery of multimedia data subjectto temporal constraints. Audio and video data are delivered this way; in these applications optimalviewing or listening conditions require the DBMS to deliver data at certain rates offering "quality ofservice" above a certain threshold. Data is consumed as it is delivered, unlike in repository applications,where it may be processed later (e.g., multimedia electronic mail). Simple multimedia viewing of videodata, for example, requires a system to simulate VCR-like functionality. Complex and interactivemultimedia presentations involve orchestration directions to control the retrieval order of components in aseries or in parallel. Interactive environments must support capabilities such as real-time editing analysisor annotating of video and audio data.

    Collaborative work using multimedia information:This is a new category of applications in whichengineers may execute a complex design task by merging drawings, fitting subjects to design constraints,and generating new documentation, change notifications, and so forth. Intelligent healthcare networks aswell as telemedicine will involve doctors collaborating among themselves, analyzing multimedia patientdata and information in real time as it is generated.

    All of these application areas present major challenges for the design of multimedia database systems.

  • 7/25/2019 Database System Applications

    36/43

    D. A. I. T. M.| Ananya Banerjee

    35Multimedia Databases

    2. Data Management Issues

    Multimedia applications dealing with thousands of images, documents, audio and video segments, andfree text data depend critically on appropriate modeling of the structure and content of data and thendesigning appropriate database schemas for storing and retrieving multimedia information. Multimediainformation systems are very complex and embrace a large set of issues, including the following:

    Modeling: This area has the potential for applying database versus information retrieval techniques tothe problem. There are problems of dealing with complex objects made up of a wide range of types ofdata: numeric, text, graphic (computer-generated image), animated graphic image, audio stream, andvideo sequence. Documents constitute a specialized area and deserve special consideration. Design: The conceptual, logical, and physical design of multimedia databases has not been addressedfully, and it remains an area of active research. The design process can be based on the generalmethodology, but the performance and tuning issues at each level are far more complex. Storage: Storage of multimedia data on standard disklike devices presents problems of representation,compression, mapping to device hierarchies, archiving, and buffering during the input/output operation.Adhering to standards such as JPEO or MPEO is one way most vendors of multimedia products are likelyto deal with this issue. In DBMSs, a "BLOB" (Binary Large Object) facility allows untyped bitmaps to be

    stored and retrieved. Standardized software will be required to deal with synchronization andcompression/decompression, and will be coupled with indexing problems, which are still in the researchdomain. Queries and retrieval: The "database" way of retrieving information is based on query languages andinternal index structures. The "information retrieval" way relies strictly on keywords or predefined indexterms. For images, video data, and audio data, this opens up many issues, among them efficient queryformulation, query execution, and optimization. The standard optimization techniques need to bemodified to work with multimedia data types. Performance: For multimedia applications involving only documents and text, performance constraintsare subjectively determined by the user. For applications involving video playback or audio-videosynchronization, physical limitations dominate. For instance, video must be delivered at a steady rate of60 frames per second. Techniques for query optimization may compute expected response time before

    evaluating the query. The use of parallel processing of data may alleviate some problems, but such effortsare currently subject to further experimentation.

    3. Multimedia Database Applications

    Large-scale applications of multimedia databases can be expected to encompass a large number ofdisciplines and enhance existing capabilities. Some important applications will be involved: Documents and records management:A large number of industries and businesses keep very detailedrecords and a variety of documents. The data may include engineering design and manufacturing data,medical records of patients, publishing material, and insurance claim records.

    Knowledge dissemination:The multimedia mode, a very effective means of knowledge dissemination,will encompass a phenomenal growth in electronic books, catalogs, manuals, encyclopedias and

    repositories of information on many topics. Education and training:Teaching materials for different audiences-from kindergarten students toequipment operators to professionals-can be designed from multimedia sources. Digital libraries areexpected to have a major influence on the way future students and researchers as well as other users willaccess vast repositories of educational material. Marketing, advertising, retailing,entertainment, and travel:There are virtually no limits to usingmultimedia information in these applications-from effective sales presentations to virtual tours of citiesand art galleries. The film industry has already shown the power of special effects in creating animations

  • 7/25/2019 Database System Applications

    37/43

    D. A. I. T. M. | Ananya Banerjee

    36 Advanced DBMS Notes

    and synthetically designed animals, aliens, and special effects. The use of predesigned stored objects inmultimedia databases will expand the range of these applications. Real-time control and monitoring: Coupled with active database technology, multimedia presentationof information can be a very effective means for monitoring and controlling complex tasks such asmanufacturing operations, nuclear power plants, patients in intensive care units, and transportationsystems.

  • 7/25/2019 Database System Applications

    38/43

    D. A. I. T. M.| Ananya Banerjee

    37Object-Oriented Databases

    I.Object-Oriented Databases

    1. Motivation

    The relational model is the basis of many commercial relational DBMS products (e.g., DB2,

    Informix, Oracle, Sybase) and the structured query language (SQL) is a widely accepted standard for bothretrieving and updating data.

    The basic relational model is simple and mainly views data as tables of rows and columns.The types of data that can be stored in a table are basic types such as integer, string and decimal.Relational DBMSs have been extremely successful in the market. However, the traditionalRDBMSs are not suitable for applications with complex data structures or new data types for large,unstructured objects