advance concept in data bases unit-3 by arun pratap singh

Upload: arunpratapsingh

Post on 08-Feb-2018

235 views

Category:

Documents


4 download

TRANSCRIPT

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    1/81

    PREPARED BY ARUN PRATAP SINGH MTECH2nd SEMESTER

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    2/81

    PREPARED BY ARUN PRATAP SINGH 1

    1

    DISTRIBUTED DATABASES INTRODUCTION :

    o A distributed database (DDB) is a collection of multiple, logically interrelateddatabases distributed over a computer network.

    o A distributed database management system (DDBMS) is the software that managesthe DDB and provides an access mechanism that makes this distributiontransparent to the users.

    A distributed database is adatabase in whichstorage devices are not all attached to a common

    processing unit such as theCPU,controlled by a distributeddatabase management system (together

    sometimes called a distributed database system). It may be stored in multiplecomputers,located in

    the same physical location; or may be dispersed over anetwork of interconnected computers. Unlike

    parallel systems, in which the processors are tightly coupled and constitute a single database system,

    a distributed database system consists of loosely-coupled sites that share no physical components.

    System administrators can distribute collections of data (e.g. in a database) across multiple physical

    locations. A distributed database can reside on network servers on the Internet, oncorporate intranets or extranets, or on other company networks. Because they store data across

    multiple computers, distributed databases can improve performance atend-user worksites by allowing

    transactions to be processed on many machines, instead of being limited to one.[2]

    Two processes ensure that the distributed databases remain up-to-date and

    current:replication andduplication.

    UNIT : III

    http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Computer_storagehttp://en.wikipedia.org/wiki/CPUhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Computershttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Network_servershttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/Intranetshttp://en.wikipedia.org/wiki/Extranetshttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/End-userhttp://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Replication_(computing)http://en.wikipedia.org/wiki/Duplicationhttp://en.wikipedia.org/wiki/Duplicationhttp://en.wikipedia.org/wiki/Replication_(computing)http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/End-userhttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Extranetshttp://en.wikipedia.org/wiki/Intranetshttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/Network_servershttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Computershttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/CPUhttp://en.wikipedia.org/wiki/Computer_storagehttp://en.wikipedia.org/wiki/Database
  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    3/81

    PREPARED BY ARUN PRATAP SINGH 2

    2

    1. Replication involves using specialized software that looks for changes in the distributive

    database. Once the changes have been identified, the replication process makes all the

    databases look the same. The replication process can be complex and time-consuming

    depending on the size and number of the distributed databases. This process can also require

    a lot of time and computer resources.

    2. Duplication, on the other hand, has less complexity. It basically identifies one database as

    amaster and then duplicates that database. The duplication process is normally done at a set

    time after hours. This is to ensure that each distributed location has the same data. In the

    duplication process, users may change only the master database. This ensures that local data

    will not be overwritten.

    A database user accesses the distributed database through:

    Local applications

    -applications which do not require data from other sites.

    Global applications

    -applications which do require data from other sites.

    A homogeneous distributed database has identical software and hardware running all

    databases instances, and may appear through a single interface as if it were a single

    database. A heterogeneous distributed databasemay have different hardware, operating

    systems, database management systems, and even data models for different databases.

    A DDBMS mainly classified into two types:

    Homogeneous Distributed database management systems

    Heterogeneous Distributed database management systems

    Homogeneous DDBMS :-

    In a homogeneous distributed database all sites have identical software and are awareof each other and agree to cooperate in processing user requests.

    The homogeneous system is much easier to design and manage The operating system used, at each location must be same or compatible. The database application (or DBMS) used at each location must be same or compatible.

    In a homogeneous distributed database all sites have identical software and are aware of each other

    and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms

    of right to change schema or software. A homogeneous DDBMS appears to the user as a single

    http://en.wikipedia.org/wiki/Master-slave_(technology)http://en.wikipedia.org/wiki/Master-slave_(technology)
  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    4/81

    PREPARED BY ARUN PRATAP SINGH 3

    3

    system. The homogeneous system is much easier to design and manage. The following conditions

    must be satisfied for homogeneous database:

    The operating system used, at each location must be same or compatible

    The data structures used at each location must be same or compatible.

    The database application (or DBMS) used at each location must be same or compatible.

    Heterogeneous DDBMS :-

    In a heterogeneous distributed database different sites may use different schema andsoftware.

    In heterogeneous systems, different nodes may have different hardware & software anddata structures at various nodes or locations are also incompatible.

    Different computers and operating systems, database applications or data models maybe used at each of the locations.

    In a heterogeneous distributed database, different sites may use different schema and software.

    Difference in schema is a major problem for query processing and transaction processing. Sites may

    not be aware of each other and may provide only limited facilities for cooperation in transaction

    processing. In heterogeneous systems, different nodes may have different hardware & software and

    data structures at various nodes or locations are also incompatible. Different computers and operating

    systems, database applications or data models may be used at each of the locations. For example,

    one location may have the latest relational database management technology, while another location

    may store data using conventional files or old version of database management system. Similarly, one

    location may have the Windows NT operating system, while another may have UNIX. Heterogeneous

    systems are usually used when individual sites use their own hardware and software. On

    heterogeneous system, translations are required to allow communication between different sites (or

    DBMS). In this system, the users must be able to make requests in a database language at their local

    sites. Usually the SQL database language is used for this purpose. If the hardware is different, then

    the translation is straightforward, in which computer codes and word-length is changed. The

    heterogeneous system is often not technically or economically feasible. In this system, a user at one

    location may be able to read but not update the data at another location.

    Advantages :

    Increase reliability and availability

    Easier expansion

    Reliable transactions - due to replication of the database

    Hardware, operating-system, network, fragmentation, DBMS, replication and locationindependence

    Economics it may cost less to create a network of smaller computers with the power of a

    single large computer

    Disadvantages :

    Additional software is required

    Operating system should support distributed environment

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    5/81

    PREPARED BY ARUN PRATAP SINGH 4

    4

    Concurrency control poses a major issue. It can be solved bylocking andtimestamping.

    Distributed access to data

    Analysis of distributed data

    DISTRIBUTED DATABASE ARCHITECTURE :

    A distributed database systemallows applications to access data from local and remote databases. Ina homogenous distributed database system, each database is an Oracle Database. Ina heterogeneous distributed database system, at least one of the databases is not an Oracle Database.Distributed databases use a client/serverarchitecture to process information requests.

    http://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Lock_(database)http://en.wikipedia.org/wiki/Timestamphttp://en.wikipedia.org/wiki/Timestamphttp://en.wikipedia.org/wiki/Lock_(database)http://en.wikipedia.org/wiki/Concurrency_control
  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    6/81

    PREPARED BY ARUN PRATAP SINGH 5

    5

    Homogenous Distributed Database Systems :-A homogenous distributed database system is a network of two or more Oracle Databases that reside onone or more machines. Figure 29-1 illustrates a distributed system that connects three databases: hq, mfg,

    and sales. An application can simultaneously access or modify the data in several databases in a singledistributed environment. For example, a single query from a Manufacturing client on local database mfgcan retrieve joined data from the products table on the local database and the dept table on the remote hqdatabase.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    7/81

    PREPARED BY ARUN PRATAP SINGH 6

    6

    Heterogeneous Distributed Database Systems :-

    In a heterogeneous distributed database system, at least one of the databases is a non-Oracle Databasesystem. To the application, the heterogeneous distributed database system appears as a single, local,Oracle Database. The local Oracle Database server hides the distribution and heterogeneity of the data.

    The Oracle Database server accesses the non-Oracle Database system using Oracle HeterogeneousServices in conjunction with an agent. If you access the non-Oracle Database data store using an OracleTransparent Gateway, then the agent is a system-specific application. For example, if you include a Sybasedatabase in an Oracle Database distributed system, then you need to obtain a Sybase-specific transparentgateway so that the Oracle Database in the system can communicate with it.

    Client/Server Database Architecture :-

    A database server is the Oracle software managing a database, and a client is an application that requestsinformation from a server. Each computer in a network is a node that can host one or more databases.Each node in a distributed database system can act as a client, a server, or both, depending on the situation.

    In Figure 29-2, the host for the hq database is acting as a database server when a statement is issuedagainst its local data (for example, the second statement in each transaction issues a statement againstthe local dept table), but is acting as a client when it issues a statement against remote data (for example,the first statement in each transaction is issued against the remote table emp in the sales database).

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    8/81

    PREPARED BY ARUN PRATAP SINGH 7

    7

    DISTRIBUTED DATABASE SYSTEM DESIGN :

    In a distributed system, data are physically distributed among several sites but it provides aview of single logical database to its users. Each node of a distributed database system may

    follow the three-tier architecture like the centralized database management system (DBMS).Thus, the design of a distributed database system involves the design of a global conceptualschema, in addition to the local schemas, which conform to the three-tier architecture of the

    DBMS in each site. The design of computer network across the sites of a distributed system

    adds extra complexity to the design issue. The crucial design issue involves the distributionof data among the sites of the distributed system. Therefore, the design and implementationof the distributed database system is a very complicated task and it involves three important

    factors as listed in the following.

    Fragmentation

    A global relation may be divided into several non-overlappingsubrelations called fragments, which are then distributed among sites.

    AllocationAllocation involves the issue of allocating fragments among sites in a

    distributed system. Each fragment is stored at the site with optimal distribution.

    ReplicationThe distributed database system may maintain several copies of afragment at different sites.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    9/81

    PREPARED BY ARUN PRATAP SINGH 8

    8

    Design Strategies:-

    In this process, the database design starts from the global schema design and proceeds bydesigning the fragmentation of the database, and then by allocating the fragments to the differentsites, creating the physical images. The process is completed by performing the physical design

    of the data at each site, which is allocated to it. The global schema design involves both designingof global conceptual schema and global external schemas (view design). In global conceptualschema designing step, the user needs to specify the data entities and to determine theapplications that will run on the database as well as statistical information about theseapplications. At this stage, the design of local conceptual schemas is considered. The objectiveof this step is to design local conceptual schemas by distributing the entities over the sites of thedistributed system. Rather than distributing relations, it is quite common to partition relations intosubrelations, which are then distributed to different sites. Thus, in a top-down approach, thedistributed database design involves two phases, namely, fragmentation and allocation.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    10/81

    PREPARED BY ARUN PRATAP SINGH 9

    9

    The fragmentation phase is the process of clustering information in fragments that can beaccessed simultaneously by different applications, whereas the allocation phase is the processof distributing the generated fragments among the sites of a distributed database system. In thetop-down design process, the last step is the physical database design, which maps the localconceptual schemas into physical storage devices available at corresponding sites. Top-down

    design process is the best suitable for those distributed systems that are developed from scratch.

    In the bottom-up design process, the issue of integration of several existing local schemas into aglobal conceptual schema is considered to develop a distributed system. When several existingdatabases are aggregated to develop a distributed system, the bottom-up design process isfollowed. This process is based on the integration of several existing schemas into a single globalschema. It is also possible to aggregate several existing heterogeneous systems for constructinga distributed database system using the bottom-up approach. Thus, the bottom-up design processrequires the following steps:

    The selection of a common database model for describing the global schema of thedatabase

    The translation of each local schema into the common data model The integration of the local schemas into a common global schema.

    Any one of the above design strategies is followed to develop a distributed database system.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    11/81

    PREPARED BY ARUN PRATAP SINGH 10

    10

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    12/81

    PREPARED BY ARUN PRATAP SINGH 11

    11

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    13/81

    PREPARED BY ARUN PRATAP SINGH 12

    12

    DISTRIBUTED QUERY PROCESSING :

    Query Processing Basics

    centralized query processing

    distributed query processing

    The retrieval of data from different sites in a network is known as distributed queryprocessing.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    14/81

    PREPARED BY ARUN PRATAP SINGH 13

    13

    Step 1 Query Decomposition :-o Normalization

    o Analysiso Simplificationo Restructuring

    Step 2 Data Localization

    Step 3 Global Query Optimization

    Step 4 Local Optimization

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    15/81

    PREPARED BY ARUN PRATAP SINGH 14

    14

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    16/81

    PREPARED BY ARUN PRATAP SINGH 15

    15

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    17/81

    PREPARED BY ARUN PRATAP SINGH 16

    16

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    18/81

    PREPARED BY ARUN PRATAP SINGH 17

    17

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    19/81

    PREPARED BY ARUN PRATAP SINGH 18

    18

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    20/81

    PREPARED BY ARUN PRATAP SINGH 19

    19

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    21/81

    PREPARED BY ARUN PRATAP SINGH 20

    20

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    22/81

    PREPARED BY ARUN PRATAP SINGH 21

    21

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    23/81

    PREPARED BY ARUN PRATAP SINGH 22

    22

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    24/81

    PREPARED BY ARUN PRATAP SINGH 23

    23

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    25/81

    PREPARED BY ARUN PRATAP SINGH 24

    24

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    26/81

    PREPARED BY ARUN PRATAP SINGH 25

    25

    PHFPRIMARY HORIZONTAL FRAGMENTATION

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    27/81

    PREPARED BY ARUN PRATAP SINGH 26

    26

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    28/81

    PREPARED BY ARUN PRATAP SINGH 27

    27

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    29/81

    PREPARED BY ARUN PRATAP SINGH 28

    28

    VFVERTICAL FRAGMENTATION

    DHFDERIVED HORIZONTAL FRAGMENTATION

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    30/81

    PREPARED BY ARUN PRATAP SINGH 29

    29

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    31/81

    PREPARED BY ARUN PRATAP SINGH 30

    30

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    32/81

    PREPARED BY ARUN PRATAP SINGH 31

    31

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    33/81

    PREPARED BY ARUN PRATAP SINGH 32

    32

    CONCURRENCY CONTROL IN DISTRIBUTED DATABASE :

    Concurrency Control: In distributed database systems, database is typically used by many

    users. These systems usually allow multiple transactions to run concurrently i.e. at the same time.

    Concurrency control is the activity of coordinating concurrent accesses to a database in a

    multiuser database management system (DBMS). Concurrency control permits users to access

    a database in a multi-programmed fashion while preserving the illusion that each user is executingalone on a dedicated system. The main technical difficulty in attaining this goal is to prevent

    database updates performed by one user from interfering with database retrievals and updates

    performed by another. When the transactions are updating data concurrently, it may lead to

    several problems with the consistency of the data.

    Distributed Concurrency Control Algorithms:In this paper, we consider some of the distributed concurrency control algorithms. We summarizethe salient aspects of these four algorithms in this section. In order to do this, we must first explainthe structure that we have assumed for distributed transactions. Before discussing the algorithms,we need to get an idea about the distributed transactions. Distributed Transaction:A distributed

    transaction is a transaction that runs in multiple processes, usually on several machines. Eachprocess works for the transaction. Distributed transaction processing systems are designed tofacilitate transactions that span heterogeneous, transaction-aware resource managers in adistributed environment. The execution of a distributed transaction requires coordination betweena global transaction management system and all the local resource managers of all the involvedsystems. The resource manager and transaction processing monitor are the two primary elementsof any distributed transactional system. Distributed transactions, like local transactions, mustobserve the ACID properties. However, maintenance of these properties is very complicated fordistributed transactions because a failure can occur in any process. If such a failure occurs, eachprocess must undo any work that has already been done on behalf of the transaction. A distributedtransaction processing system maintains the ACID properties in distributed transactions by usingtwo features:

    overable processes log their actions and therefore can restoreearlier states if a failure occurs.

    or aborting of a transaction. The most common commit protocol is the two-phase commit protocol.

    Distributed Two-Phase Locking (2PL):

    In order to ensure serializability of parallel executed transactions elaborated different methods of

    concurrency control. One of these methods is locking method. There are different forms of locking

    method. Two phase locking protocol is one of the basic concurrency control protocols in

    distributed database systems. The main approach of this protocol is read any, write all.

    Transactions set read locks on items that they read, and they convert their read locks to write

    locks on items that need to be updated. To read an item, it suffices to set a read lock on any copyof the item, so the local copy is locked; to update an item, write locks are required on all copies.

    Write locks are obtained as the transaction executes, with the transaction blocking on a write

    request until all of the copies of the item to be updated have been successfully locked. All locks

    are held until the transaction has successfully committed or aborted [2]. The 2PL Protocol

    oversees locks by determining when transactions can acquire and release locks. The 2PL

    protocol forces each transaction to make a lock or unlock request in two steps:

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    34/81

    PREPARED BY ARUN PRATAP SINGH 33

    33

    The transaction first enters into the Growing Phase, makes requests for required locks, then gets

    into the Shrinking phase where it releases all locks and cannot make any more requests.

    Transactions in 2PL Protocol should get all needed locks before getting into the unlock phase.

    While the 2PL protocol guarantees serializability, it does not ensure that deadlocks do not happen.

    So deadlock is a possibility in this algorithm, Local deadlocks are checked for any time a

    transaction blocks, and are resolved when necessary by restarting the transaction with the most

    recent initial startup time among those involved in the deadlock cycle. Global deadlock detection

    is handled by a Snoop process, which periodically requests waits-for information from all sites

    and then checks for and resolves any global deadlocks.

    Wound-Wait (WW):The second algorithm is the distributed wound-wait locking algorithm. It follows the same

    approach as the 2 PL protocol. The difference lies in the fact that it differs from 2PL in its handling

    of the deadlock problem: unlike 2PL protocol, rather than maintaining waits-for information and

    then checking for local and global deadlocks, deadlocks are prevented via the use of timestampsin this algorithm. Each transaction is numbered according to its initial startup time, and younger

    transactions are prevented from making older ones wait. If an older transaction requests a lock,

    and if the request would lead to the older transaction waiting for a younger transaction, the

    younger transaction is wounded it is restarted unless it is already in the second phase of its

    commit protocol. Younger transactions can wait for older transactions so that the possibility of

    deadlocks is eliminated [2].

    t(T1) > t(T2) -: If requesting transaction [t(T1)] is younger than the transaction [t(T2)] that has

    holds lock on requested data item then requesting transaction [t(T1)] has to wait. t(T1) < t(T2) -:

    If requesting transaction [t(T1)] is older than the transaction [t(T2)] that has holds lock on

    requested data item then requesting transaction [t(T1)] has to abort or rollback.

    Basic Timestamp Ordering (BTO):A timestamp is a unique identifier created by the DBMS to identify a transaction. Typically,timestamp values are assigned in the order in which the transactions are submitted to the system,so a timestamp can be thought of as the transaction start time. The third algorithm is the basictimestamp ordering algorithm. The idea for this scheme is to order the transactions based on theirtimestamps. A schedule in which the transactions participate is then serializable, and theequivalent serial schedule has the transactions in order of their timestamp values. This is called

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    35/81

    PREPARED BY ARUN PRATAP SINGH 34

    34

    timestamp ordering (TO). Like wound-wait, it employs transaction startup timestamps, but it usesthem differently. BTO associates timestamps with all recently accessed data items and requiresthat conflicting data accesses by transactions be performed in timestamp order instead of usinglocking approach. Transactions that attempt to perform out-of-order accesses are restarted. Whena read request is received for an item, it is permitted if the timestamp of the requester exceedsthe items write timestamp. When a write request is received, it is permitted if the requesters

    timestamp exceeds the read timestamp of the item; in the event that the timestamp of therequester is less than the write timestamp of the item, the update is simply ignored [2]. Forreplicated data, the read any, write all approach is used, so a read request may be sent to anycopy while a write request must be sent to all copies. Integration of the algorithm with two phasecommit is accomplished as follows: Writers keep their updates in a private workspace until committime.

    Distributed Optimistic(OPT):

    The fourth algorithm is the distributed, timestamp-based, optimistic concurrency control algorithm.

    which operates by exchanging certification information during the commit protocol. For each data

    item, a read timestamp and a write timestamp are maintained. Transactions may read and update

    data items freely, storing any updates into a local workspace until commit time. For each read,

    the transaction must remember the version identifier (i.e., write timestamp) associated with the

    item when it was read. Then, when all of the transactions cohorts have completed their work, and

    have reported back to the master, the transaction is assigned a globally unique timestamp. This

    time stamp is sent to each cohort in the prepare to commit message ,and it is used to locally

    certify all of its reads and writes as follows [2]:

    A read request is certified if-:

    (i) The version that was read is still the current version of the item, and

    (ii) No write with a newer timestamp has already been locally certified.

    A write request is certified if-:

    (i) No later reads have been certified and subsequently committed, and

    (ii) No later reads have been locally certified already [2].

    Concurrency control is the activity of coordinating concurrent accesses to a database ina multi-user database management system (DBMS)

    Several problems1. The lost update problem.2. The temporary update problem3. The incorrect summary problem

    As an example, consider an on-line airline reservation system. Suppose two customers Customer

    A and Customer B, simultaneously try to reserve a seat for the same flight. In the absence of

    concurrency control, these two activities could interfere as illustrated in Figure 1. Let Seat No 18

    be the first available seat. Both transactions could read the reservation information approximately

    same time and they reserve the seat No 18 for Customer A and Customer B, and store the result

    back into the database. The net effect is incorrect: Although two customers reserved a seat, the

    database reflects only one activity, the other reservation is lost by the system.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    36/81

    PREPARED BY ARUN PRATAP SINGH 35

    35

    RECOVERY CONTROL IN DISTRIBUTED DATABASES :

    As with local recovery, distributed database recovery aims to maintain the atomicity and durabilityof distributed transactions. A database must guarantee that all statements in a transaction,distributed or non-distributed, either commit or roll back as a unit. The effects of an ongoingtransaction should be invisible to all other transactions at all sites. This transparency should be

    true for transactions that include any type of operations, including queries, updates or remoteprocedure calls. In a distributed database environment also the database management systemmust coordinate transaction control with these characteristics over a communication network andmaintain data consistency, even if network or system failure occurs.

    In DDBMS, a given transaction is submitted at some one site, but it can access data at other sitesas well. When a transaction is submitted at some one site, the transaction manager at that sitebreaks it up into a collection of one or more sub-transactions that execute at different sites. Thetransaction manager then submits these sub-transactions to the transaction managers at theother sites and coordinates their activities. To ensure the atomicity of the global transaction, theDDBMS must ensure that sub-transactions of the global transaction either all commit or all abort.

    Recovery Control in distributed database is based on the two-phase commit protocol. The two

    phase commit protocol is the transaction protocol duo to which all nodes and databases agree

    with each other to commit a transaction. This protocol is required in an environment where single

    transaction can interact with multiple independent resource managers as in case of distributed

    databases. It also support data integrity by ensuring that modifications made to transactions are

    either committed by all the databases involved in the distributed system or rolled back by all the

    databases.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    37/81

    PREPARED BY ARUN PRATAP SINGH 36

    36

    The two phases commit protocol works in two phases. The first phase is called the prepare phase

    during which the updates are recorded in a transaction log file, and the resource through a

    resource manager indicates that it is ready to make the changes. Resources can vote either to

    commit phase depend on the vote of resources. If all resources vote to commit then, all the

    resources participating in the transaction are updated whereas if one or more of the resources

    vote to roll back, then, all the resources are rolled back to their previous state.

    Consider an example, in which an interaction between a coordinator at a local site and a

    participant at a remote site takes place and a transaction has requested the commit operation. In

    the first phase, the coordinator instructs the participants to get ready and sends the get ready

    message at time. Participants make an entry in it log and send the ok message as

    acknowledgement to the coordinator. The coordinator then, writes an entry in the log, takes a final

    decision and sends it to the participants.

    Prepare Phase

    Coordinator receives a commit request

    Coordinator instructs all resource managers to get ready to go either way on the

    transaction. Each resource manager writes all updates from that transaction to its

    own physical log

    Coordinator receives replies from all resource managers. If all are ok, it writes

    commit to its own log; if not then it writes rollback to its log

    Commit Phase

    Coordinator then informs each resource manager of its decision and broadcasts a

    message to either commit or rollback (abort). If the message is commit, then eachresource manager transfers the update from its log to its database

    A failure during the commit phase puts a transaction in limbo. This has to be

    tested for and handled with timeouts or polling

    WEB DATABASES :

    The World Wide Web (WWW)popularly known as "the Web"originally developed inSwitzerland at CERN (Note 1) in early 1990 as a large-scale hypermedia information servicesystem for biological scientists to share information (Note 2). Today this technology allows

    universal access to this shared information to anyone having access to the Internet and the Webcontains hundreds of millions of Web pages within the reach of millions of users.

    In Web technology, a basic client-server architecture underlies all activities. Information is storedon computers designated as Web servers in publicly accessible shared files encoded usingHyperText Markup Language (HTML). A number of tools enable users to create Web pagesformatted with HTML tags, freely mixed with multimedia contentfrom graphics to audio andeven to video. A page has many interspersed hyperlinksliterally a link that enables a user to"browse" or move from one page to another across the Internet. This ability has given a

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    38/81

    PREPARED BY ARUN PRATAP SINGH 37

    37

    tremendous power to end users in searching and navigating related informationoften acrossdifferent continents.

    Information on the Web is organized according to a Uniform Resource Locator (URL)something similar to an address that provides the complete pathname of a file. The pathnameconsists of a string of machine and directory names separated by slashes and ends in a filename.For example, the table of contents of this book is currently at the following URL:

    http://cseng.aw.com/book/0,,0805317554,00.html

    A URL always begins with a hypertext transport protocol (http), which is the protocol used by

    the Web browsers, a program that communicates with the Web server, and vice versa. Web

    browsers interpret and present HTML documents to users. Popular Web browsers include the

    Internet Explorer of Microsoft and the Netscape Navigator. A collection of HTML documents and

    other files accessible via the URL on a Web server is called a Web site. In the above URL,

    "www.awl.com" may be called the Web site of Addison Wesley Publishing.

    Providing Access to Databases on the World Wide Web

    Todays technology has been moving rapidly from static to dynamic Web pages, where content

    may be in a constant state of flux. The Web server uses a standard interface called the CommonGateway Interface (CGI) to act as the middlewarethe additional software layer between theuser interface front-end and the DBMS back-end that facilitates access to heterogeneousdatabases. The CGI middleware executes external programs or scripts to obtain the dynamicinformation, and it returns the information to the server in HTML, which is given back to thebrowser.

    As the Web undergoes its latest transformations, it has become necessary to allow users accessnot only to file systems but to databases and DBMSs to support query processing, reportgeneration, and so forth. The existing approaches may be divided into two categories:

    1.Access using CGI scripts: The database server can be made to interact with the Web servervia CGI. Figure 27.01 shows a schematic for the database access architecture on the Web

    using CGI scripts, which are written in languages like PERL, Tcl, or C. The maindisadvantage of this approach is that for each user request, the Web server must start anew CGI process: each process makes a new connection with the DBMS and the Webserver must wait until the results are delivered to it. No efficiency is achieved by anygrouping of multiple users requests; moreover, the developer must keep the scripts in theCGI-bin subdirectories only, which opens it to a possible breach of security. The fact thatCGI has no language associated with it but requires database developers to learn PERLor Tcl is also a drawback. Manageability of scripts is another problem if the scripts arescattered everywhere.

    http://cseng.aw.com/book/0,,0805317554,00.htmlhttp://cseng.aw.com/book/0,,0805317554,00.html
  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    39/81

    PREPARED BY ARUN PRATAP SINGH 38

    38

    2. Access using JDBC: JDBC is a set of Java classes developed by Sun Microsystems toallow access to relational databases through the execution of SQL statements. It is a wayof connecting with databases, without any additional processes for each client request.Note that JDBC is a name trademarked by Sun; it does not stand for Java Data Baseconnectivity as many believe. JDBC has the capabilities to connect to a database, sendSQL statements to a database and to retrieve the results of a query using the Java classesConnection, Statement, and Result Set respectively. With Javas claimed platformindependence, an application may run on any Java-capable browser, which loads the Javacode from the server and runs it on the clients browser. The Java code is DBMStransparent; the JDBC drivers for individual DBMSs on the server end carry the task ofinteracting with that DBMS. If the JDBC driver is on the client, the application runs on theclient and its requests are communicated to the DBMS directly by the driver. For standardSQL requests, many RDBMSs can be accessed this way. The drawback of using JDBCis the prospect of executing Java through virtual machines with inherent efficiency. TheJDBC bridge to Object Database Connectivity (ODBC) remains another way of getting tothe RDBMSs.

    Besides CGI, other Web server vendors are launching their own middleware products for

    providing multiple database connectivity. These include Internet Server API (ISAPI) from

    Microsoft and Netscape API (NSAPI) from Netscape. In the next section we describe the Web

    access option provided by Informix. Other DBMS vendors already have, or will have similar

    provisions to support database access on the Web.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    40/81

    PREPARED BY ARUN PRATAP SINGH 39

    39

    THE WEB INTEGRATION OPTION OF INFORMIX :

    Informix has addressed the limitations of CGI and the incompatibilities of CGI, NSAPI, and ISAPI

    by creating the Web Integration Option (WIO). WIO eliminates the need for scripts. Developers

    use tools to create intelligent HTML pages called Application Pages (or App Pages) directly within

    the database. They execute SQL statements dynamically, format the results inside HTML, and

    return the resulting Web page to the end users. The schematic architecture is shown in Figure27.02. WIO uses the Web Driver, a lightweight CGI process that is invoked when a URL request

    is received by the Web server. A unique session identifier is generated for each request but the

    WIO application is persistent and does not terminate after each request.

    When the WIO application receives a request from the Web driver, it connects to the database

    and executes Web Explode, a function that executes queries within Web pages and formats

    results as a Web page that goes back to the browser via the Web driver.

    Informix HTML tag extensions allow Web authors to create applications that can dynamicallyconstruct Web page templates from the Informix Dynamic Server and present them to the endusers. WIO also lets users create their own customized tags to perform specialized tasks. Thus,without resorting to any programming or script development, powerful applications can bedesigned. Another feature of WIO helps transaction-oriented applications by providing anapplication programming interface (API) that offers a collection of basic services such asconnection and session management that can be incorporated into Web application.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    41/81

    PREPARED BY ARUN PRATAP SINGH 40

    40

    WIO supports applications developed in C, C++, and Java. This flexibility lets developers port

    existing applications to the Web or develop new applications in these languages. The WIO is

    integrated with Web server software and utilizes the native security mechanism of the Informix

    Dynamic Server. The open architecture of WIO allows the use of various Web browsers and

    servers.

    THE ORACLE WEBSERVER :

    ORACLE supports Web access to databases using the components shown in Figure 27.03. The

    client requests files that are called "static" or "dynamic" files from the Web server. Static files have

    a fixed content whereas dynamic files may have content that includes results of queries to the

    database.There is an HTTP demon (a process that runs continuously) called Web Listener

    running on the server that listens for the requests originating in the clients. A static file (document)

    is retrieved from the file system of the server and displayed on the Web browser at the client.

    Request for a dynamic page is passed by the listener to a Web request broker (WRB), which is a

    multi-threaded dispatcher that adheres to cartridges. Cartridges are software modules

    (mentioned earlier in Section 13.2.6) that perform specific functions on specific types of data; theycan communicate among themselves. Currently cartridges are provided for PL/SQL, Java, and

    Live HTML; customized cartridges may be provided as well.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    42/81

    PREPARED BY ARUN PRATAP SINGH 41

    41

    OPEN PROBLEMS WITH WEB DATABASES :

    The Web is an important factor in planning for enterprise-wide computing environments, both forproviding external access to the enterprises systems and information for customers and suppliersand for marketing and advertising purposes. At the same time, due to security requirements,employees of some organizations are restricted to operate within intranetssubnetworks thatcannot be accessed freely from the outside world. Among the prominent applications of theintranet and the WWW are databases to support electronic storefronts, parts and productcatalogs, directories and schedules, newsstands, and bookstores. Electronic commercethepurchasing of products and services electronically on the Internetis likely to become a majorapplication supported by such databases.

    The future challenges of managing databases on the Web will be many, among them thefollowing:

    Web technology needs to be integrated with the object technology. Currently, the web canbe viewed as a distributed object system, with HTML pages functioning as objectsidentified by the URL.

    HTML functionality is too simple to support complex application requirements. As we saw,

    the Web Integration Option of Informix adds further tags to HTML. In general, additionalfacilities will be needed to (1) make Web clients function as application front ends,integrating data from multiple heterogeneous databases; (2) make Web clients presentdifferent views of the same data to different users; and (3) make Web clients "intelligent"by providing additional data mining functionality (see Section 26.2).

    Web page content can be made more dynamic by adding more "behavior" to it as an object(see Chapter 11 for a discussion of object modeling). In this respect (1) client and serverobjects (HTML pages) can be made to interact; (2) Web pages can be treated ascollections of programmable objects; and (3) client-side code can access these objectsand manipulate them dynamically.

    The support for a large number of clients coupled with reasonable response times for queriesagainst very large (several tens of gigabytes in size) databases will be major challengesfor Web databases. They will have to be addressed both by Web servers and by theunderlying DBMSs.

    Efforts are underway to address the limitations of the current data structuring technology,particularly by the World Wide Web Consortium (W3C). The W3C is designing a Web ObjectModel. W3C is also proposing an Extensible Markup Language (XML) for structured documentinterchange on the Web. XML defines a subset of SGML (the Standard Generalized MarkupLanguage), allowing customization of markup languages with application-specific tags. XML israpidly gaining ground due to its extensibility in defining new tags. W3Cs Document ObjectModel (DOM) defines an object-oriented API for HTML or XML documents presented by a Webclient. W3C is also defining metadata modeling standards for describing Internet resources.

    MULTIMEDIA DATABASES :

    A multimedia system is a computer controlled integration of medial information objectsof different types (text, images, audio, video,). The integration refers to: Data modeling Storage

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    43/81

    PREPARED BY ARUN PRATAP SINGH 42

    42

    Presentation Time synchronization

    A promise is that the media must be digitally represented, or at least digitally controllable.

    In the years ahead multimedia information systems are expected to dominate our daily lives. Our

    houses will be wired for bandwidth to handle interactive multimedia applications. Our high-definition TV/computer workstations will have access to a large number of databases, including

    digital libraries that will distribute vast amounts of multisource multimedia content.

    The Nature of Multimedia Data and Applications

    Nature of Multimedia Applications

    In Section 23.3 we discussed the advanced modeling issues related to multimedia data. We also

    examined the processing of multiple types of data in Chapter 13 in the context of object relational

    DBMSs (ORDBMSs). DBMSs have been constantly adding to the types of data they support.

    Today the following types of multimedia data are available in current systems:

    Text: May be formatted or unformatted. For ease of parsing structured documents,standards like SGML and variations such as HTML are being used.

    Graphics: Examples include drawings and illustrations that are encoded using somedescriptive standards (e.g., CGM, PICT, postscript).

    Images: Includes drawings, photographs, and so forth, encoded in standard formats suchas bitmap, JPEG, and MPEG. Compression is built into JPEG and MPEG. Theseimages are not subdivided into components. Hence querying them by content (e.g., findall images containing circles) is nontrivial.

    Animations: Temporal sequences of image or graphic data.

    Video:A set of temporally sequenced photographic data for presentation at specifiedratesfor example, 30 frames per second.

    Structured audio:A sequence of audio components comprising note, tone, duration, andso forth.

    Audio: Sample data generated from aural recordings in a string of bits in digitized form.Analog recordings are typically converted into digital form before storage.

    Composite or mixed multimedia data:A combination of multimedia data types such asaudio and video which may be physically mixed to yield a new storage format or logicallymixed while retaining original types and formats. Composite data also containsadditional control information describing how the information should be rendered.

    Nature of Multimedia Applications

    Multimedia data may be stored, delivered, and utilized in many different ways. Applications maybe categorized based on their data management characteristics as follows:

    Repository applications:A large amount of multimedia data as well as metadata is storedfor retrieval purposes. A central repository containing multimedia data may bemaintained by a DBMS and may be organized into a hierarchy of storage levelslocaldisks, tertiary disks and tapes, optical disks, and so on. Examples include repositories ofsatellite images, engineering drawings and designs, space photographs, and radiologyscanned pictures.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    44/81

    PREPARED BY ARUN PRATAP SINGH 43

    43

    Presentation applications:A large number of applications involve delivery of multimediadata subject to temporal constraints. Audio and video data are delivered this way; inthese applications optimal viewing or listening conditions require the DBMS to deliverdata at certain rates offering "quality of service" above a certain threshold. Data isconsumed as it is delivered, unlike in repository applications, where it may be processedlater (e.g., multimedia electronic mail). Simple multimedia viewing of video data, for

    example, requires a system to simulate VCR-like functionality. Complex and interactivemultimedia presentations involve orchestration directions to control the retrieval order ofcomponents in a series or in parallel. Interactive environments must support capabilitiessuch as real-time editing analysis or annotating of video and audio data.

    Collaborative work using multimedia information: This is a new category of applications inwhich engineers may execute a complex design task by merging drawings, fittingsubjects to design constraints, and generating new documentation, change notifications,and so forth. Intelligent healthcare networks as well as telemedicine will involve doctorscollaborating among themselves, analyzing multimedia patient data and information inreal time as it is generated.

    All of these application areas present major challenges for the design of multimedia databasesystems.

    DATA MANAGEMENT ISSUES :

    Multimedia applications dealing with thousands of images, documents, audio and videosegments, and free text data depend critically on appropriate modeling of the structure andcontent of data and then designing appropriate database schemas for storing and retrievingmultimedia information. Multimedia information systems are very complex and embrace a largeset of issues, including the following:

    Modeling: This area has the potential for applying database versus information retrievaltechniques to the problem. There are problems of dealing with complex objects (see

    Chapter 11) made up of a wide range of types of data: numeric, text, graphic (computer-generated image), animated graphic image, audio stream, and video sequence.Documents constitute a specialized area and deserve special consideration.

    Design: The conceptual, logical, and physical design of multimedia databases has not beenaddressed fully, and it remains an area of active research. The design process can bebased on the general methodology described in Chapter 16, but the performance andtuning issues at each level are far more complex.

    Storage: Storage of multimedia data on standard disklike devices presents problems ofrepresentation, compression, mapping to device hierarchies, archiving, and bufferingduring the input/output operation. Adhering to standards such as JPEG or MPEG is oneway most vendors of multimedia products are likely to deal with this issue. In DBMSs, a"BLOB" (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved.Standardized software will be required to deal with synchronization andcompression/decompression, and will be coupled with indexing problems, which are stillin the research domain.

    Queries and retrieval: The "database" way of retrieving information is based on querylanguages and internal index structures. The "information retrieval" way relies strictly onkeywords or predefined index terms. For images, video data, and audio data, this opensup many issues, among them efficient query formulation, query execution, and

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    45/81

    PREPARED BY ARUN PRATAP SINGH 44

    44

    optimization. The standard optimization techniques we discussed in Chapter 18 need tobe modified to work with multimedia data types.

    Performance: For multimedia applications involving only documents and text, performanceconstraints are subjectively determined by the user. For applications involving videoplayback or audio-video synchronization, physical limitations dominate. For instance,video must be delivered at a steady rate of 60 frames per second. Techniques for queryoptimization may compute expected response time before evaluating the query. The useof parallel processing of data may alleviate some problems, but such efforts are currentlysubject to further experimentation.

    Such issues have given rise to a variety of open research problems. We look at a few

    representative problems now.

    MULTIMEDIA DATABASE APPLICATIONS :

    Large-scale applications of multimedia databases can be expected to encompass a large numberof disciplines and enhance existing capabilities. Some important applications will be involved:

    Documents and records management:A large number of industries and businesses keepvery detailed records and a variety of documents. The data may include engineeringdesign and manufacturing data, medical records of patients, publishing material, andinsurance claim records.

    Knowledge dissemination: The multimedia mode, a very effective means of knowledgedissemination, will encompass a phenomenal growth in electronic books, catalogs,manuals, encyclopedias and repositories of information on many topics.

    Education and training: Teaching materials for different audiencesfrom kindergartenstudents to equipment operators to professionalscan be designed from multimediasources. Digital libraries are expected to have a major influence on the way future studentsand researchers as well as other users will access vast repositories of educationalmaterial. (See Section 27.6 on digital libraries.)

    Marketing, advertising, retailing, entertainment, and travel: There are virtually no limits tousing multimedia information in these applicationsfrom effective sales presentations tovirtual tours of cities and art galleries. The film industry has already shown the power ofspecial effects in creating animations and synthetically designed animals, aliens, andspecial effects. The use of predesigned stored objects in multimedia databases willexpand the range of these applications.

    Real-time control and monitoring: Coupled with active database technology, multimediapresentation of information can be a very effective means for monitoring and controllingcomplex tasks such as manufacturing operations, nuclear power plants, patients inintensive care units, and transportation systems.

    MOBILE DATABASES :

    Recent advances in wireless technology have led to mobile computing, a new dimension in data

    communication and processing. The mobile computing environment will provide database

    applications with useful aspects of wireless technology. The mobile computing platform allows

    users to establish communication with other users and to manage their work while they are

    mobile. This feature is especially useful to geographically dispersed organizations. Typical

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    46/81

    PREPARED BY ARUN PRATAP SINGH 45

    45

    examples might include traffic police, taxi dispatchers, and weather reporting services, as well as

    financial market reporting and information brokering applications. However, there are a number

    of hardware as well as software problems that must be resolved before the capabilities of mobile

    computing can be fully utilized. Some of the software problemswhich may involve data

    management, transaction management, and database recoveryhave their origin in distributed

    database systems. In mobile computing, however, these problems become more difficult to solve,

    mainly because of the narrow bandwidth of the wireless communication channels, the relatively

    short active life of the power supply (battery) of mobile units, and the changing locations of

    required information (sometimes in cache, sometimes in the air, sometimes at the server). In

    addition, mobile computing has its own unique architectural challenges.

    The general architecture of a mobile platform is illustrated in Figure 27.04. It is a distributed

    architecture where a number of computers, generally referred to as Fixed Hosts (FS) and Base

    Stations (BS), are interconnected through a high-speed wired network. Fixed hosts are general

    purpose computers that are not equipped to manage mobile units but can be configured to do so.

    Base stations are equipped with wireless interfaces and can communicate with mobile units to

    support data access.

    Mobile Units (MU) (or hosts) and base stations communicate through wireless channels havingbandwidths significantly lower than those of a wired network. A downlink channel is used forsending data from a BS to an MU and an uplink channel is used for sending data from an MU to

    its BS. Recent products for portable wireless have an upper limit of 1 Mbps (megabits per second)for infrared communication, 2 Mbps for radio communication, and 9.14 Kbps (kilobits per second)for cellular telephony. Ethernet, by comparison, provides 10 Mbps fast Ethernet and FDDI provide100 Mbps and ATM (asynchronous transfer mode) provides 155 Mbps.

    Mobile units are battery-powered portable computers that move freely in a geographic mobility

    domain, an area that is restricted by the limited bandwidth of wireless communication channels.

    To manage the mobility of units, the entire geographic mobility domain is divided into smaller

    domains called cells. The mobile discipline requires that the movement of mobile units be

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    47/81

    PREPARED BY ARUN PRATAP SINGH 46

    46

    unrestricted within the geographic mobility domain (intercell movement), while having information

    access contiguity during movement guarantees that the movement of a mobile unit across cell

    boundaries will have no effect on the data retrieval process.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    48/81

    PREPARED BY ARUN PRATAP SINGH 47

    47

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    49/81

    PREPARED BY ARUN PRATAP SINGH 48

    48

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    50/81

    PREPARED BY ARUN PRATAP SINGH 49

    49

    Types of Data in Mobile Applications

    Applications that run on mobile hosts have different data requirements. Users either engage inpersonal communications or office activities, or they simply receive updates on frequentlychanging information. Mobile applications can be categorized in two ways: (1) vertical applicationsand (2) horizontal applications (Note 3). In vertical applications users access data within aspecific cell, and access is denied to users outside of that cell. For example, users can obtaininformation on the location of doctors or emergency centers within a cell or parking availabilitydata at an airport cell. In horizontal applications, users cooperate on accomplishing a task, andthey can handle data distributed throughout the system. The horizontal application market ismassive; two types of applications most mentioned are mail-enabled applications and informationservices to mobile users.

    Data may be classified into three categories:

    1. Private data:A single user owns this data and manages it. No other user may access it.

    2. Public data: This data can be used by anyone who can read it. Only one source updates it.Examples include weather bulletins or stock prices.

    3. Shared data: This data is accessed both in read and write modes by groups of users.Examples include inventory data for products in a company.

    Public data is primarily managed by vertical applications, while shared data is used by horizontal

    applications, possibly with some replication. Copies of shared data may be stored both in base

    and mobile stations. This presents a variety of difficult problems in transaction management

    consistency as well as integrity and scalability of the architecture.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    51/81

    PREPARED BY ARUN PRATAP SINGH 50

    50

    SPATIAL DATABASE :

    Spatial databases provide concepts for databases that keep track of objects in a multi-

    dimensional space. For example, cartographic databases that store maps include two-

    dimensional spatial descriptions of their objectsfrom countries and states to rivers, cities, roads,seas, and so on. These databases are used in many applications, such as environmental,

    emergency, and battle management. Other databases, such as meteorological databases for

    weather information, are three-dimensional, since temperatures and other meteorological

    information are related to three-dimensional spatial points. In general, a spatial database stores

    objects that have spatial characteristics that describe them. The spatial relationships among the

    objects are important, and they are often needed when querying the database. Although a spatial

    database can in general refer to an n-dimensional space for any n, we will limit our discussion to

    two dimensions as an illustration.

    The main extensions that are needed for spatial databases are models that can interpret spatial

    characteristics. In addition, special indexing and storage structures are often needed to improveperformance. Let us first discuss some of the model extensions for two-dimensional spatialdatabases. The basic extensions needed are to include two-dimensional geometric concepts,such as points, lines and line segments, circles, polygons, and arcs, in order to specify the spatialcharacteristics of objects. In addition, spatial operations are needed to operate on the objectsspatial characteristicsfor example, to compute the distance between two objectsas well asspatial Boolean conditionsfor example, to check whether two objects spatially overlap. Toillustrate, consider a database that is used for emergency management applications. A descriptionof the spatial positions of many types of objects would be needed. Some of these objects generally

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    52/81

    PREPARED BY ARUN PRATAP SINGH 51

    51

    have static spatial characteristics, such as streets and highways, water pumps (for fire control),police stations, fire stations, and hospitals. Other objects have dynamic spatial characteristics thatchange over time, such as police vehicles, ambulances, or fire trucks.

    The following categories illustrate three typical types of spatial queries:

    Range query: Finds the objects of a particular type that are within a given spatial area or

    within a particular distance from a given location. (For example, finds all hospitals withinthe Dallas city area, or finds all ambulances within five miles of an accident location.)

    Nearest neighbor query: Finds an object of a particular type that is closest to a givenlocation. (For example, finds the police car that is closest to a particular location.)

    Spatial joins or overlays: Typically joins the objects of two types based on some spatialcondition, such as the objects intersecting or overlapping spatially or being within acertain distance of one another. (For example, finds all cities that fall on a major highwayor finds all homes that are within two miles of a lake.)

    For these and other types of spatial queries to be answered efficiently, special techniques for

    spatial indexing are needed. One of the best known techniques is the use of R-trees and their

    variations. R-trees group together objects that are in close spatial physical proximity on the same

    leaf nodes of a tree-structured index. Since a leaf node can point to only a certain number of

    objects, algorithms for dividing the space into rectangular subspaces that include the objects are

    needed. Typical criteria for dividing the space include minimizing the rectangle areas, since this

    would lead to a quicker narrowing of the search space. Problems such as having objects with

    overlapping spatial areas are handled in different ways by the many different variations of R-trees.

    The internal nodes of R-trees are associated with rectangles whose area covers all the rectangles

    in its subtree. Hence, R-trees can easily answer queries, such as find all objects in a given area

    by limiting the tree search to those subtrees whose rectangles intersect with the area given in the

    query.

    Other spatial storage structures include quadtrees and their variations. Quadtrees generally

    divide each space or subspace into equally sized areas, and proceed with the sub-divisions ofeach subspace to identify the positions of various objects. Recently, many newer spatial access

    structures have been proposed, and this area is still an active research area.

    CLUSTERING BASED DISASTER PROOF DATABASES :

    If downtime is not an option, and the Web never closes for business, how do you keep

    your company's doors open 24/7? The answer lies in high-availability (HA) systems that

    approach 100 percent uptime.

    The principles of high availability define a level of backup and recovery. Until recently, highavailability simply meant hardware or software recovery via RAID (Redundant Array of

    Independent Disks). RAID addressed the need for fault tolerance in data but didn't solve the

    problem of a complete DBMS failure.

    http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259')http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259')http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259')
  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    53/81

    PREPARED BY ARUN PRATAP SINGH 52

    52

    For even more uptime, database administrators are turning to clustering as the best way to

    achieve high availability. Recent moves by Oracle, with its Real Application Cluster, and Microsoft,

    with MCS (Microsoft Cluster Service) have made multinode clusters for HA in production

    environments mainstream.

    In a high-availability setup, a cluster functions by associating servers that have the ability to share

    a disk group. As illustrated here, each node has fail-over node within its cluster. If a failure occurs

    in Node 1, Node 2 picks up the slack by assuming the resources and the unique logic and

    transaction functions of the failed DBMS.

    Clustering can have the added benefit of not being bound by node colocation. Fiber-optic

    connections, which can be cabled for miles between the nodes in a cluster, ensure continued

    operation even in the face of a complete meltdown of your primary system.

    When a hot-standby model is in place, downtimes may be less than a minute. This is especially

    important if your service-level agreement requires higher than 99.9 percent uptime, which

    translates to only 8.7 hours of downtime per year.

    Clustering technologies are pricey, however. The enterprise software and hardware must be

    uniform and compatible with the clustering technology to work properly. There's also the

    associated overhead in the design and maintenance of redundant systems.

    One cost-effective solution is log shipping, in which a database can synchronize physically distinctdatabases by sending transactions logs from one server to another. In the event of a failure, the

    logs can be used to reinstate the settings up to the point of the failure. Other methods include

    snapshot databases and replication technologies such as Sybase's Replication Server, which has

    been around for decades.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    54/81

    PREPARED BY ARUN PRATAP SINGH 53

    53

    High-availability add-ons to databases are useful but should be understood in the context of a

    complete HA methodology. This requires a concerted effort toward standardization on each of

    your mission-critical infrastructures. Fault-tolerant application design with hands-off exception

    handling, self-healing and redundant networks, and a stable operating system are all prerequisites

    for high availability.

    When you adhere to these standards, enforceable, database-specific HA technologies are sure

    to lead your enterprise on the path to minimum downtime.

    SOME QUESTIONS

    Q .1 How a distributed database can be recovered in case of failure ?

    Ans :

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    55/81

    PREPARED BY ARUN PRATAP SINGH 54

    54

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    56/81

    PREPARED BY ARUN PRATAP SINGH 55

    55

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    57/81

    PREPARED BY ARUN PRATAP SINGH 56

    56

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    58/81

    PREPARED BY ARUN PRATAP SINGH 57

    57

    In a distributed setting, the server must log a write operation not only to the local log file, butalso to 1, 2 or more remote logs. The issue is close to replication methods, the main choice beingto adopt either a synchronousor asynchronous protocol.

    Synchronous protocol.The server acknowledges the Client only when all the remote nodes have sent aconfirmation of the successful completion of their write() operation. In practice, the Clientwaits until the slower of all the writers sends its acknowledgment. This may severely hinderthe efficiency of updates, but the obvious advantage is that all the replicas are consistent.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    59/81

    PREPARED BY ARUN PRATAP SINGH 58

    58

    Asynchronous protocol.The Client application waits only until one of the copies (the fastest) has been effectivelywritten. Clearly, this puts a risk on data consistency, as a subsequent read operation mayaccess an older version that does not yet reflect the update.

    Q. 2 What is a multimedia database? explain the methods of mining multimediadatabase.

    Ans : Multimedia database : Explained above.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    60/81

    PREPARED BY ARUN PRATAP SINGH 59

    59

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    61/81

    PREPARED BY ARUN PRATAP SINGH 60

    60

    The methods of mining multimedia database :

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    62/81

    PREPARED BY ARUN PRATAP SINGH 61

    61

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    63/81

    PREPARED BY ARUN PRATAP SINGH 62

    62

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    64/81

    PREPARED BY ARUN PRATAP SINGH 63

    63

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    65/81

    PREPARED BY ARUN PRATAP SINGH 64

    64

    Q. 3 Write short notes on any four of the following :(1) Web database(2) Mobile databases

    Ans : Explained above.

    Q. 4 Design issues of distributed databases.Ans :

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    66/81

    PREPARED BY ARUN PRATAP SINGH 65

    65

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    67/81

    PREPARED BY ARUN PRATAP SINGH 66

    66

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    68/81

    PREPARED BY ARUN PRATAP SINGH 67

    67

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    69/81

    PREPARED BY ARUN PRATAP SINGH 68

    68

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    70/81

    PREPARED BY ARUN PRATAP SINGH 69

    69

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    71/81

    PREPARED BY ARUN PRATAP SINGH 70

    70

    Q. 5 What is commit protocol and why is it required in a distributed database?Describes and compare. Two phase and three phase commit. What is blocking andhow does the three phase protocol prevent it? Explain Distributed transaction.

    Ans :

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    72/81

    PREPARED BY ARUN PRATAP SINGH 71

    71

    Distributed transaction :

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    73/81

    PREPARED BY ARUN PRATAP SINGH 72

    72

    Commit Protocol :

    why commit protocol is required in a distributed database : because of system failure

    and to provide atomicity across sites.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    74/81

    PREPARED BY ARUN PRATAP SINGH 73

    73

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    75/81

    PREPARED BY ARUN PRATAP SINGH 74

    74

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    76/81

    PREPARED BY ARUN PRATAP SINGH 75

    75

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    77/81

    PREPARED BY ARUN PRATAP SINGH 76

    76

    BLOCKING PROBLEM :

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    78/81

    PREPARED BY ARUN PRATAP SINGH 77

    77

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    79/81

    PREPARED BY ARUN PRATAP SINGH 78

    78

    Q. 6 What are web databases ? How databases are accessed through web ?

    Ans : Web databases : Explained above.

    Providing Access to Databases on the World Wide Web

    Todays technology has been moving rapidly from static to dynamic Web pages, where contentmay be in a constant state of flux. The Web server uses a standard interface called the CommonGateway Interface (CGI) to act as the middlewarethe additional software layer between theuser interface front-end and the DBMS back-end that facilitates access to heterogeneousdatabases. The CGI middleware executes external programs or scripts to obtain the dynamicinformation, and it returns the information to the server in HTML, which is given back to thebrowser.

    As the Web undergoes its latest transformations, it has become necessary to allow users accessnot only to file systems but to databases and DBMSs to support query processing, report

    generation, and so forth. The existing approaches may be divided into two categories:1.Access using CGI scripts: The database server can be made to interact with the Web server

    via CGI. Figure 27.01 shows a schematic for the database access architecture on the Webusing CGI scripts, which are written in languages like PERL, Tcl, or C. The maindisadvantage of this approach is that for each user request, the Web server must start anew CGI process: each process makes a new connection with the DBMS and the Webserver must wait until the results are delivered to it. No efficiency is achieved by anygrouping of multiple users requests; moreover, the developer must keep the scripts in theCGI-bin subdirectories only, which opens it to a possible breach of security. The fact that

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    80/81

    PREPARED BY ARUN PRATAP SINGH 79

    79

    CGI has no language associated with it but requires database developers to learn PERLor Tcl is also a drawback. Manageability of scripts is another problem if the scripts arescattered everywhere.

    2. Access using JDBC: JDBC is a set of Java classes developed by Sun Microsystems toallow access to relational databases through the execution of SQL statements. It is a wayof connecting with databases, without any additional processes for each client request.Note that JDBC is a name trademarked by Sun; it does not stand for Java Data Baseconnectivity as many believe. JDBC has the capabilities to connect to a database, sendSQL statements to a database and to retrieve the results of a query using the Java classesConnection, Statement, and Result Set respectively. With Javas claimed platformindependence, an application may run on any Java-capable browser, which loads the Javacode from the server and runs it on the clients browser. The Java code is DBMS

    transparent; the JDBC drivers for individual DBMSs on the server end carry the task ofinteracting with that DBMS. If the JDBC driver is on the client, the application runs on theclient and its requests are communicated to the DBMS directly by the driver. For standardSQL requests, many RDBMSs can be accessed this way. The drawback of using JDBCis the prospect of executing Java through virtual machines with inherent efficiency. TheJDBC bridge to Object Database Connectivity (ODBC) remains another way of getting tothe RDBMSs.

  • 7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

    81/81

    80

    Besides CGI, other Web server vendors are launching their own middleware products for

    providing multiple database connectivity. These include Internet Server API (ISAPI) from

    Microsoft and Netscape API (NSAPI) from Netscape. In the next section we describe the Web

    access option provided by Informix. Other DBMS vendors already have, or will have similar

    provisions to support database access on the Web.

    Q. 7 Compare the relative merits of centralized and hierarchical deadlockdetection in a distributed DBMS.

    Ans :

    A centralized deadlock detection scheme is a reasonable choice if the concurrent controlalgorithm is also centralized.It is better for distributed access patterns across sites since deadlocks occurring between any canbe immediately identified. However, this benefit comes at the expense of communicationsbetween the central location and every other site.

    A hierarchical deadlock detectionscheme releases the burden of one single site for deadlockdetection, and let more sites get involved.When access patterns are more localized, perhaps by geographic area, they may likely occuramong certain sites with frequent communications. The hierarchical approach is more efficient inthat it checks for deadlocks where they most likely happen, the hierarchical scheme splitsdeadlock detection efforts, thus resulting in greater efficiency.