advance concept in data bases unit-3 by arun pratap singh

7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh

1/81

PREPARED BY ARUN PRATAP SINGH MTECH2nd SEMESTER


2/81

PREPARED BY ARUN PRATAP SINGH 1

1

DISTRIBUTED DATABASES INTRODUCTION :

o A distributed database (DDB) is a collection of multiple, logically interrelateddatabases distributed over a computer network.

o A distributed database management system (DDBMS) is the software that managesthe DDB and provides an access mechanism that makes this distributiontransparent to the users.

A distributed database is adatabase in whichstorage devices are not all attached to a common

processing unit such as theCPU,controlled by a distributeddatabase management system (together

sometimes called a distributed database system). It may be stored in multiplecomputers,located in

the same physical location; or may be dispersed over anetwork of interconnected computers. Unlike

parallel systems, in which the processors are tightly coupled and constitute a single database system,

a distributed database system consists of loosely-coupled sites that share no physical components.

System administrators can distribute collections of data (e.g. in a database) across multiple physical

locations. A distributed database can reside on network servers on the Internet, oncorporate intranets or extranets, or on other company networks. Because they store data across

multiple computers, distributed databases can improve performance atend-user worksites by allowing

transactions to be processed on many machines, instead of being limited to one.[2]

Two processes ensure that the distributed databases remain up-to-date and

current:replication andduplication.

UNIT : III
http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Computer_storagehttp://en.wikipedia.org/wiki/CPUhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Computershttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Network_servershttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/Intranetshttp://en.wikipedia.org/wiki/Extranetshttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/End-userhttp://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Replication_(computing)http://en.wikipedia.org/wiki/Duplicationhttp://en.wikipedia.org/wiki/Duplicationhttp://en.wikipedia.org/wiki/Replication_(computing)http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/End-userhttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Extranetshttp://en.wikipedia.org/wiki/Intranetshttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/Network_servershttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Computershttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/CPUhttp://en.wikipedia.org/wiki/Computer_storagehttp://en.wikipedia.org/wiki/Database


3/81


2

1. Replication involves using specialized software that looks for changes in the distributive

database. Once the changes have been identified, the replication process makes all the

databases look the same. The replication process can be complex and time-consuming

depending on the size and number of the distributed databases. This process can also require

a lot of time and computer resources.

2. Duplication, on the other hand, has less complexity. It basically identifies one database as

amaster and then duplicates that database. The duplication process is normally done at a set

time after hours. This is to ensure that each distributed location has the same data. In the

duplication process, users may change only the master database. This ensures that local data

will not be overwritten.

A database user accesses the distributed database through:

Local applications

-applications which do not require data from other sites.

Global applications

-applications which do require data from other sites.

A homogeneous distributed database has identical software and hardware running all

databases instances, and may appear through a single interface as if it were a single

database. A heterogeneous distributed databasemay have different hardware, operating

systems, database management systems, and even data models for different databases.

A DDBMS mainly classified into two types:

Homogeneous Distributed database management systems

Heterogeneous Distributed database management systems

Homogeneous DDBMS :-

In a homogeneous distributed database all sites have identical software and are awareof each other and agree to cooperate in processing user requests.

The homogeneous system is much easier to design and manage The operating system used, at each location must be same or compatible. The database application (or DBMS) used at each location must be same or compatible.

In a homogeneous distributed database all sites have identical software and are aware of each other

and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms

of right to change schema or software. A homogeneous DDBMS appears to the user as a single
http://en.wikipedia.org/wiki/Master-slave_(technology)http://en.wikipedia.org/wiki/Master-slave_(technology)


4/81


3

system. The homogeneous system is much easier to design and manage. The following conditions

must be satisfied for homogeneous database:

The operating system used, at each location must be same or compatible

The data structures used at each location must be same or compatible.

The database application (or DBMS) used at each location must be same or compatible.

Heterogeneous DDBMS :-

In a heterogeneous distributed database different sites may use different schema andsoftware.

In heterogeneous systems, different nodes may have different hardware & software anddata structures at various nodes or locations are also incompatible.

Different computers and operating systems, database applications or data models maybe used at each of the locations.

In a heterogeneous distributed database, different sites may use different schema and software.

Difference in schema is a major problem for query processing and transaction processing. Sites may

not be aware of each other and may provide only limited facilities for cooperation in transaction

processing. In heterogeneous systems, different nodes may have different hardware & software and

data structures at various nodes or locations are also incompatible. Different computers and operating

systems, database applications or data models may be used at each of the locations. For example,

one location may have the latest relational database management technology, while another location

may store data using conventional files or old version of database management system. Similarly, one

location may have the Windows NT operating system, while another may have UNIX. Heterogeneous

systems are usually used when individual sites use their own hardware and software. On

heterogeneous system, translations are required to allow communication between different sites (or

DBMS). In this system, the users must be able to make requests in a database language at their local

sites. Usually the SQL database language is used for this purpose. If the hardware is different, then

the translation is straightforward, in which computer codes and word-length is changed. The

heterogeneous system is often not technically or economically feasible. In this system, a user at one

location may be able to read but not update the data at another location.

Advantages :

Increase reliability and availability

Easier expansion

Reliable transactions - due to replication of the database

Hardware, operating-system, network, fragmentation, DBMS, replication and locationindependence

Economics it may cost less to create a network of smaller computers with the power of a

single large computer

Disadvantages :

Additional software is required

Operating system should support distributed environment


5/81


4

Concurrency control poses a major issue. It can be solved bylocking andtimestamping.

Distributed access to data

Analysis of distributed data

DISTRIBUTED DATABASE ARCHITECTURE :

A distributed database systemallows applications to access data from local and remote databases. Ina homogenous distributed database system, each database is an Oracle Database. Ina heterogeneous distributed database system, at least one of the databases is not an Oracle Database.Distributed databases use a client/serverarchitecture to process information requests.
http://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Lock_(database)http://en.wikipedia.org/wiki/Timestamphttp://en.wikipedia.org/wiki/Timestamphttp://en.wikipedia.org/wiki/Lock_(database)http://en.wikipedia.org/wiki/Concurrency_control


6/81


5

Homogenous Distributed Database Systems :-A homogenous distributed database system is a network of two or more Oracle Databases that reside onone or more machines. Figure 29-1 illustrates a distributed system that connects three databases: hq, mfg,

and sales. An application can simultaneously access or modify the data in several databases in a singledistributed environment. For example, a single query from a Manufacturing client on local database mfgcan retrieve joined data from the products table on the local database and the dept table on the remote hqdatabase.


7/81


6

Heterogeneous Distributed Database Systems :-

In a heterogeneous distributed database system, at least one of the databases is a non-Oracle Databasesystem. To the application, the heterogeneous distributed database system appears as a single, local,Oracle Database. The local Oracle Database server hides the distribution and heterogeneity of the data.

The Oracle Database server accesses the non-Oracle Database system using Oracle HeterogeneousServices in conjunction with an agent. If you access the non-Oracle Database data store using an OracleTransparent Gateway, then the agent is a system-specific application. For example, if you include a Sybasedatabase in an Oracle Database distributed system, then you need to obtain a Sybase-specific transparentgateway so that the Oracle Database in the system can communicate with it.

Client/Server Database Architecture :-

A database server is the Oracle software managing a database, and a client is an application that requestsinformation from a server. Each computer in a network is a node that can host one or more databases.Each node in a distributed database system can act as a client, a server, or both, depending on the situation.

In Figure 29-2, the host for the hq database is acting as a database server when a statement is issuedagainst its local data (for example, the second statement in each transaction issues a statement againstthe local dept table), but is acting as a client when it issues a statement against remote data (for example,the first statement in each transaction is issued against the remote table emp in the sales database).


8/81


7

DISTRIBUTED DATABASE SYSTEM DESIGN :

In a distributed system, data are physically distributed among several sites but it provides aview of single logical database to its users. Each node of a distributed database system may

follow the three-tier architecture like the centralized database management system (DBMS).Thus, the design of a distributed database system involves the design of a global conceptualschema, in addition to the local schemas, which conform to the three-tier architecture of the

DBMS in each site. The design of computer network across the sites of a distributed system

adds extra complexity to the design issue. The crucial design issue involves the distributionof data among the sites of the distributed system. Therefore, the design and implementationof the distributed database system is a very complicated task and it involves three important

factors as listed in the following.

Fragmentation

A global relation may be divided into several non-overlappingsubrelations called fragments, which are then distributed among sites.

AllocationAllocation involves the issue of allocating fragments among sites in a

distributed system. Each fragment is stored at the site with optimal distribution.

ReplicationThe distributed database system may maintain several copies of afragment at different sites.


9/81


8

Design Strategies:-

In this process, the database design starts from the global schema design and proceeds bydesigning the fragmentation of the database, and then by allocating the fragments to the differentsites, creating the physical images. The process is completed by performing the physical design

of the data at each site, which is allocated to it. The global schema design involves both designingof global conceptual schema and global external schemas (view design). In global conceptualschema designing step, the user needs to specify the data entities and to determine theapplications that will run on the database as well as statistical information about theseapplications. At this stage, the design of local conceptual schemas is considered. The objectiveof this step is to design local conceptual schemas by distributing the entities over the sites of thedistributed system. Rather than distributing relations, it is quite common to partition relations intosubrelations, which are then distributed to different sites. Thus, in a top-down approach, thedistributed database design involves two phases, namely, fragmentation and allocation.


10/81


9

The fragmentation phase is the process of clustering information in fragments that can beaccessed simultaneously by different applications, whereas the allocation phase is the processof distributing the generated fragments among the sites of a distributed database system. In thetop-down design process, the last step is the physical database design, which maps the localconceptual schemas into physical storage devices available at corresponding sites. Top-down

design process is the best suitable for those distributed systems that are developed from scratch.

In the bottom-up design process, the issue of integration of several existing local schemas into aglobal conceptual schema is considered to develop a distributed system. When several existingdatabases are aggregated to develop a distributed system, the bottom-up design process isfollowed. This process is based on the integration of several existing schemas into a single globalschema. It is also possible to aggregate several existing heterogeneous systems for constructinga distributed database system using the bottom-up approach. Thus, the bottom-up design processrequires the following steps:

The selection of a common database model for describing the global schema of thedatabase

The translation of each local schema into the common data model The integration of the local schemas into a common global schema.

Any one of the above design strategies is followed to develop a distributed database system.


11/81


10


12/81


11


13/81


12

DISTRIBUTED QUERY PROCESSING :

Query Processing Basics

centralized query processing

distributed query processing

The retrieval of data from different sites in a network is known as distributed queryprocessing.


14/81


13

Step 1 Query Decomposition :-o Normalization

o Analysiso Simplificationo Restructuring

Step 2 Data Localization

Step 3 Global Query Optimization

Step 4 Local Optimization


15/81


14


16/81


15


17/81


16


18/81


17


19/81


18


20/81


19


21/81


20


22/81


21


23/81


22


24/81


23


25/81


24


26/81


25

PHFPRIMARY HORIZONTAL FRAGMENTATION


27/81


26


28/81


27


29/81


28

VFVERTICAL FRAGMENTATION

DHFDERIVED HORIZONTAL FRAGMENTATION


30/81


29


31/81


30


32/81


31


33/81


32

CONCURRENCY CONTROL IN DISTRIBUTED DATABASE :

Concurrency Control: In distributed database systems, database is typically used by many

users. These systems usually allow multiple transactions to run concurrently i.e. at the same time.

Concurrency control is the activity of coordinating concurrent accesses to a database in a

multiuser database management system (DBMS). Concurrency control permits users to access

a database in a multi-programmed fashion while preserving the illusion that each user is executingalone on a dedicated system. The main technical difficulty in attaining this goal is to prevent

database updates performed by one user from interfering with database retrievals and updates

performed by another. When the transactions are updating data concurrently, it may lead to

several problems with the consistency of the data.

Distributed Concurrency Control Algorithms:In this paper, we consider some of the distributed concurrency control algorithms. We summarizethe salient aspects of these four algorithms in this section. In order to do this, we must first explainthe structure that we have assumed for distributed transactions. Before discussing the algorithms,we need to get an idea about the distributed transactions. Distributed Transaction:A distributed

transaction is a transaction that runs in multiple processes, usually on several machines. Eachprocess works for the transaction. Distributed transaction processing systems are designed tofacilitate transactions that span heterogeneous, transaction-aware resource managers in adistributed environment. The execution of a distributed transaction requires coordination betweena global transaction management system and all the local resource managers of all the involvedsystems. The resource manager and transaction processing monitor are the two primary elementsof any distributed transactional system. Distributed transactions, like local transactions, mustobserve the ACID properties. However, maintenance of these properties is very complicated fordistributed transactions because a failure can occur in any process. If such a failure occurs, eachprocess must undo any work that has already been done on behalf of the transaction. A distributedtransaction processing system maintains the ACID properties in distributed transactions by usingtwo features:

overable processes log their actions and therefore can restoreearlier states if a failure occurs.

or aborting of a transaction. The most common commit protocol is the two-phase commit protocol.

Distributed Two-Phase Locking (2PL):

In order to ensure serializability of parallel executed transactions elaborated different methods of

concurrency control. One of these methods is locking method. There are different forms of locking

method. Two phase locking protocol is one of the basic concurrency control protocols in

distributed database systems. The main approach of this protocol is read any, write all.

Transactions set read locks on items that they read, and they convert their read locks to write

locks on items that need to be updated. To read an item, it suffices to set a read lock on any copyof the item, so the local copy is locked; to update an item, write locks are required on all copies.

Write locks are obtained as the transaction executes, with the transaction blocking on a write

request until all of the copies of the item to be updated have been successfully locked. All locks

are held until the transaction has successfully committed or aborted [2]. The 2PL Protocol

oversees locks by determining when transactions can acquire and release locks. The 2PL

protocol forces each transaction to make a lock or unlock request in two steps:


34/81


33

The transaction first enters into the Growing Phase, makes requests for required locks, then gets

into the Shrinking phase where it releases all locks and cannot make any more requests.

Transactions in 2PL Protocol should get all needed locks before getting into the unlock phase.

While the 2PL protocol guarantees serializability, it does not ensure that deadlocks do not happen.

So deadlock is a possibility in this algorithm, Local deadlocks are checked for any time a

transaction blocks, and are resolved when necessary by restarting the transaction with the most

recent initial startup time among those involved in the deadlock cycle. Global deadlock detection

is handled by a Snoop process, which periodically requests waits-for information from all sites

and then checks for and resolves any global deadlocks.

Wound-Wait (WW):The second algorithm is the distributed wound-wait locking algorithm. It follows the same

approach as the 2 PL protocol. The difference lies in the fact that it differs from 2PL in its handling

of the deadlock problem: unlike 2PL protocol, rather than maintaining waits-for information and

then checking for local and global deadlocks, deadlocks are prevented via the use of timestampsin this algorithm. Each transaction is numbered according to its initial startup time, and younger

transactions are prevented from making older ones wait. If an older transaction requests a lock,

and if the request would lead to the older transaction waiting for a younger transaction, the

younger transaction is wounded it is restarted unless it is already in the second phase of its

commit protocol. Younger transactions can wait for older transactions so that the possibility of

deadlocks is eliminated [2].

t(T1) > t(T2) -: If requesting transaction [t(T1)] is younger than the transaction [t(T2)] that has

holds lock on requested data item then requesting transaction [t(T1)] has to wait. t(T1) < t(T2) -:

If requesting transaction [t(T1)] is older than the transaction [t(T2)] that has holds lock on

requested data item then requesting transaction [t(T1)] has to abort or rollback.

Basic Timestamp Ordering (BTO):A timestamp is a unique identifier created by the DBMS to identify a transaction. Typically,timestamp values are assigned in the order in which the transactions are submitted to the system,so a timestamp can be thought of as the transaction start time. The third algorithm is the basictimestamp ordering algorithm. The idea for this scheme is to order the transactions based on theirtimestamps. A schedule in which the transactions participate is then serializable, and theequivalent serial schedule has the transactions in order of their timestamp values. This is called


35/81


34

timestamp ordering (TO). Like wound-wait, it employs transaction startup timestamps, but it usesthem differently. BTO associates timestamps with all recently accessed data items and requiresthat conflicting data accesses by transactions be performed in timestamp order instead of usinglocking approach. Transactions that attempt to perform out-of-order accesses are restarted. Whena read request is received for an item, it is permitted if the timestamp of the requester exceedsthe items write timestamp. When a write request is received, it is permitted if the requesters

timestamp exceeds the read timestamp of the item; in the event that the timestamp of therequester is less than the write timestamp of the item, the update is simply ignored [2]. Forreplicated data, the read any, write all approach is used, so a read request may be sent to anycopy while a write request must be sent to all copies. Integration of the algorithm with two phasecommit is accomplished as follows: Writers keep their updates in a private workspace until committime.

Distributed Optimistic(OPT):

The fourth algorithm is the distributed, timestamp-based, optimistic concurrency control algorithm.

which operates by exchanging certification information during the commit protocol. For each data

item, a read timestamp and a write timestamp are maintained. Transactions may read and update

data items freely, storing any updates into a local workspace until commit time. For each read,

the transaction must remember the version identifier (i.e., write timestamp) associated with the

item when it was read. Then, when all of the transactions cohorts have completed their work, and

have reported back to the master, the transaction is assigned a globally unique timestamp. This

time stamp is sent to each cohort in the prepare to commit message ,and it is used to locally

certify all of its reads and writes as follows [2]:

A read request is certified if-:

(i) The version that was read is still the current version of the item, and

(ii) No write with a newer timestamp has already been locally certified.

A write request is certified if-:

(i) No later reads have been certified and subsequently committed, and

(ii) No later reads have been locally certified already [2].

Concurrency control is the activity of coordinating concurrent accesses to a database ina multi-user database management system (DBMS)

Several problems1. The lost update problem.2. The temporary update problem3. The incorrect summary problem

As an example, consider an on-line airline reservation system. Suppose two customers Customer

A and Customer B, simultaneously try to reserve a seat for the same flight. In the absence of

concurrency control, these two activities could interfere as illustrated in Figure 1. Let Seat No 18

be the first available seat. Both transactions could read the reservation information approximately

same time and they reserve the seat No 18 for Customer A and Customer B, and store the result

back into the database. The net effect is incorrect: Although two customers reserved a seat, the

database reflects only one activity, the other reservation is lost by the system.


36/81


35

RECOVERY CONTROL IN DISTRIBUTED DATABASES :

As with local recovery, distributed database recovery aims to maintain the atomicity and durabilityof distributed transactions. A database must guarantee that all statements in a transaction,distributed or non-distributed, either commit or roll back as a unit. The effects of an ongoingtransaction should be invisible to all other transactions at all sites. This transparency should be

true for transactions that include any type of operations, including queries, updates or remoteprocedure calls. In a distributed database environment also the database management systemmust coordinate transaction control with these characteristics over a communication network andmaintain data consistency, even if network or system failure occurs.

In DDBMS, a given transaction is submitted at some one site, but it can access data at other sitesas well. When a transaction is submitted at some one site, the transaction manager at that sitebreaks it up into a collection of one or more sub-transactions that execute at different sites. Thetransaction manager then submits these sub-transactions to the transaction managers at theother sites and coordinates their activities. To ensure the atomicity of the global transaction, theDDBMS must ensure that sub-transactions of the global transaction either all commit or all abort.

Recovery Control in distributed database is based on the two-phase commit protocol. The two

phase commit protocol is the transaction protocol duo to which all nodes and databases agree

with each other to commit a transaction. This protocol is required in an environment where single

transaction can interact with multiple independent resource managers as in case of distributed

databases. It also support data integrity by ensuring that modifications made to transactions are

either committed by all the databases involved in the distributed system or rolled back by all the

databases.


37/81


36

The two phases commit protocol works in two phases. The first phase is called the prepare phase

during which the updates are recorded in a transaction log file, and the resource through a

resource manager indicates that it is ready to make the changes. Resources can vote either to

commit phase depend on the vote of resources. If all resources vote to commit then, all the

resources participating in the transaction are updated whereas if one or more of the resources

vote to roll back, then, all the resources are rolled back to their previous state.

Consider an example, in which an interaction between a coordinator at a local site and a

participant at a remote site takes place and a transaction has requested the commit operation. In

the first phase, the coordinator instructs the participants to get ready and sends the get ready

message at time. Participants make an entry in it log and send the ok message as

acknowledgement to the coordinator. The coordinator then, writes an entry in the log, takes a final

decision and sends it to the participants.

Prepare Phase

Coordinator receives a commit request

Coordinator instructs all resource managers to get ready to go either way on the

transaction. Each resource manager writes all updates from that transaction to its

own physical log

Coordinator receives replies from all resource managers. If all are ok, it writes

commit to its own log; if not then it writes rollback to its log

Commit Phase

Coordinator then informs each resource manager of its decision and broadcasts a

message to either commit or rollback (abort). If the message is commit, then eachresource manager transfers the update from its log to its database

A failure during the commit phase puts a transaction in limbo. This has to be

tested for and handled with timeouts or polling

WEB DATABASES :

The World Wide Web (WWW)popularly known as "the Web"originally developed inSwitzerland at CERN (Note 1) in early 1990 as a large-scale hypermedia information servicesystem for biological scientists to share information (Note 2). Today this technology allows

universal access to this shared information to anyone having access to the Internet and the Webcontains hundreds of millions of Web pages within the reach of millions of users.

In Web technology, a basic client-server architecture underlies all activities. Information is storedon computers designated as Web servers in publicly accessible shared files encoded usingHyperText Markup Language (HTML). A number of tools enable users to create Web pagesformatted with HTML tags, freely mixed with multimedia contentfrom graphics to audio andeven to video. A page has many interspersed hyperlinksliterally a link that enables a user to"browse" or move from one page to another across the Internet. This ability has given a


38/81


37

tremendous power to end users in searching and navigating related informationoften acrossdifferent continents.

Information on the Web is organized according to a Uniform Resource Locator (URL)something similar to an address that provides the complete pathname of a file. The pathnameconsists of a string of machine and directory names separated by slashes and ends in a filename.For example, the table of contents of this book is currently at the following URL:

http://cseng.aw.com/book/0,,0805317554,00.html

A URL always begins with a hypertext transport protocol (http), which is the protocol used by

the Web browsers, a program that communicates with the Web server, and vice versa. Web

browsers interpret and present HTML documents to users. Popular Web browsers include the

Internet Explorer of Microsoft and the Netscape Navigator. A collection of HTML documents and

other files accessible via the URL on a Web server is called a Web site. In the above URL,

"www.awl.com" may be called the Web site of Addison Wesley Publishing.

Providing Access to Databases on the World Wide Web

Todays technology has been moving rapidly from static to dynamic Web pages, where content

may be in a constant state of flux. The Web server uses a standard interface called the CommonGateway Interface (CGI) to act as the middlewarethe additional software layer between theuser interface front-end and the DBMS back-end that facilitates access to heterogeneousdatabases. The CGI middleware executes external programs or scripts to obtain the dynamicinformation, and it returns the information to the server in HTML, which is given back to thebrowser.

As the Web undergoes its latest transformations, it has become necessary to allow users accessnot only to file systems but to databases and DBMSs to support query processing, reportgeneration, and so forth. The existing approaches may be divided into two categories:

1.Access using CGI scripts: The database server can be made to interact with the Web servervia CGI. Figure 27.01 shows a schematic for the database access architecture on the Web

using CGI scripts, which are written in languages like PERL, Tcl, or C. The maindisadvantage of this approach is that for each user request, the Web server must start anew CGI process: each process makes a new connection with the DBMS and the Webserver must wait until the results are delivered to it. No efficiency is achieved by anygrouping of multiple users requests; moreover, the developer must keep the scripts in theCGI-bin subdirectories only, which opens it to a possible breach of security. The fact thatCGI has no language associated with it but requires database developers to learn PERLor Tcl is also a drawback. Manageability of scripts is another problem if the scripts arescattered everywhere.
http://cseng.aw.com/book/0,,0805317554,00.htmlhttp://cseng.aw.com/book/0,,0805317554,00.html


39/81


38

2. Access using JDBC: JDBC is a set of Java classes developed by Sun Microsystems toallow access to relational databases through the execution of SQL statements. It is a wayof connecting with databases, without any additional processes for each client request.Note that JDBC is a name trademarked by Sun; it does not stand for Java Data Baseconnectivity as many believe. JDBC has the capabilities to connect to a database, sendSQL statements to a database and to retrieve the results of a query using the Java classesConnection, Statement, and Result Set respectively. With Javas claimed platformindependence, an application may run on any Java-capable browser, which loads the Javacode from the server and runs it on the clients browser. The Java code is DBMStransparent; the JDBC drivers for individual DBMSs on the server end carry the task ofinteracting with that DBMS. If the JDBC driver is on the client, the application runs on theclient and its requests are communicated to the DBMS directly by the driver. For standardSQL requests, many RDBMSs can be accessed this way. The drawback of using JDBCis the prospect of executing Java through virtual machines with inherent efficiency. TheJDBC bridge to Object Database Connectivity (ODBC) remains another way of getting tothe RDBMSs.

Besides CGI, other Web server vendors are launching their own middleware products for

providing multiple database connectivity. These include Internet Server API (ISAPI) from

Microsoft and Netscape API (NSAPI) from Netscape. In the next section we describe the Web

access option provided by Informix. Other DBMS vendors already have, or will have similar

provisions to support database access on the Web.


40/81


39

THE WEB INTEGRATION OPTION OF INFORMIX :

Informix has addressed the limitations of CGI and the incompatibilities of CGI, NSAPI, and ISAPI

by creating the Web Integration Option (WIO). WIO eliminates the need for scripts. Developers

use tools to create intelligent HTML pages called Application Pages (or App Pages) directly within

the database. They execute SQL statements dynamically, format the results inside HTML, and

return the resulting Web page to the end users. The schematic architecture is shown in Figure27.02. WIO uses the Web Driver, a lightweight CGI process that is invoked when a URL request

is received by the Web server. A unique session identifier is generated for each request but the

WIO application is persistent and does not terminate after each request.

When the WIO application receives a request from the Web driver, it connects to the database

and executes Web Explode, a function that executes queries within Web pages and formats

results as a Web page that goes back to the browser via the Web driver.

Informix HTML tag extensions allow Web authors to create applications that can dynamicallyconstruct Web page templates from the Informix Dynamic Server and present them to the endusers. WIO also lets users create their own customized tags to perform specialized tasks. Thus,without resorting to any programming or script development, powerful applications can bedesigned. Another feature of WIO helps transaction-oriented applications by providing anapplication programming interface (API) that offers a collection of basic services such asconnection and session management that can be incorporated into Web application.


41/81


40

WIO supports applications developed in C, C++, and Java. This flexibility lets developers port

existing applications to the Web or develop new applications in these languages. The WIO is

integrated with Web server software and utilizes the native security mechanism of the Informix

Dynamic Server. The open architecture of WIO allows the use of various Web browsers and

servers.

THE ORACLE WEBSERVER :

ORACLE supports Web access to databases using the components shown in Figure 27.03. The

client requests files that are called "static" or "dynamic" files from the Web server. Static files have

a fixed content whereas dynamic files may have content that includes results of queries to the

database.There is an HTTP demon (a process that runs continuously) called Web Listener

running on the server that listens for the requests originating in the clients. A static file (document)

is retrieved from the file system of the server and displayed on the Web browser at the client.

Request for a dynamic page is passed by the listener to a Web request broker (WRB), which is a

multi-threaded dispatcher that adheres to cartridges. Cartridges are software modules

(mentioned earlier in Section 13.2.6) that perform specific functions on specific types of data; theycan communicate among themselves. Currently cartridges are provided for PL/SQL, Java, and

Live HTML; customized cartridges may be provided as well.


42/81


41

OPEN PROBLEMS WITH WEB DATABASES :

The Web is an important factor in planning for enterprise-wide computing environments, both forproviding external access to the enterprises systems and information for customers and suppliersand for marketing and advertising purposes. At the same time, due to security requirements,employees of some organizations are restricted to operate within intranetssubnetworks thatcannot be accessed freely from the outside world. Among the prominent applications of theintranet and the WWW are databases to support electronic storefronts, parts and productcatalogs, directories and schedules, newsstands, and bookstores. Electronic commercethepurchasing of products and services electronically on the Internetis likely to become a majorapplication supported by such databases.

The future challenges of managing databases on the Web will be many, among them thefollowing:

Web technology needs to be integrated with the object technology. Currently, the web canbe viewed as a distributed object system, with HTML pages functioning as objectsidentified by the URL.

HTML functionality is too simple to support complex application requirements. As we saw,

the Web Integration Option of Informix adds further tags to HTML. In general, additionalfacilities will be needed to (1) make Web clients function as application front ends,integrating data from multiple heterogeneous databases; (2) make Web clients presentdifferent views of the same data to different users; and (3) make Web clients "intelligent"by providing additional data mining functionality (see Section 26.2).

Web page content can be made more dynamic by adding more "behavior" to it as an object(see Chapter 11 for a discussion of object modeling). In this respect (1) client and serverobjects (HTML pages) can be made to interact; (2) Web pages can be treated ascollections of programmable objects; and (3) client-side code can access these objectsand manipulate them dynamically.

The support for a large number of clients coupled with reasonable response times for queriesagainst very large (several tens of gigabytes in size) databases will be major challengesfor Web databases. They will have to be addressed both by Web servers and by theunderlying DBMSs.

Efforts are underway to address the limitations of the current data structuring technology,particularly by the World Wide Web Consortium (W3C). The W3C is designing a Web ObjectModel. W3C is also proposing an Extensible Markup Language (XML) for structured documentinterchange on the Web. XML defines a subset of SGML (the Standard Generalized MarkupLanguage), allowing customization of markup languages with application-specific tags. XML israpidly gaining ground due to its extensibility in defining new tags. W3Cs Document ObjectModel (DOM) defines an object-oriented API for HTML or XML documents presented by a Webclient. W3C is also defining metadata modeling standards for describing Internet resources.

MULTIMEDIA DATABASES :

A multimedia system is a computer controlled integration of medial information objectsof different types (text, images, audio, video,). The integration refers to: Data modeling Storage


43/81


42

Presentation Time synchronization

A promise is that the media must be digitally represented, or at least digitally controllable.

In the years ahead multimedia information systems are expected to dominate our daily lives. Our

houses will be wired for bandwidth to handle interactive multimedia applications. Our high-definition TV/computer workstations will have access to a large number of databases, including

digital libraries that will distribute vast amounts of multisource multimedia content.

The Nature of Multimedia Data and Applications

Nature of Multimedia Applications

In Section 23.3 we discussed the advanced modeling issues related to multimedia data. We also

examined the processing of multiple types of data in Chapter 13 in the context of object relational

DBMSs (ORDBMSs). DBMSs have been constantly adding to the types of data they support.

Today the following types of multimedia data are available in current systems:

Text: May be formatted or unformatted. For ease of parsing structured documents,standards like SGML and variations such as HTML are being used.

Graphics: Examples include drawings and illustrations that are encoded using somedescriptive standards (e.g., CGM, PICT, postscript).

Images: Includes drawings, photographs, and so forth, encoded in standard formats suchas bitmap, JPEG, and MPEG. Compression is built into JPEG and MPEG. Theseimages are not subdivided into components. Hence querying them by content (e.g., findall images containing circles) is nontrivial.

Animations: Temporal sequences of image or graphic data.

Video:A set of temporally sequenced photographic data for presentation at specifiedratesfor example, 30 frames per second.

Structured audio:A sequence of audio components comprising note, tone, duration, andso forth.

Audio: Sample data generated from aural recordings in a string of bits in digitized form.Analog recordings are typically converted into digital form before storage.

Composite or mixed multimedia data:A combination of multimedia data types such asaudio and video which may be physically mixed to yield a new storage format or logicallymixed while retaining original types and formats. Composite data also containsadditional control information describing how the information should be rendered.

Nature of Multimedia Applications

Multimedia data may be stored, delivered, and utilized in many different ways. Applications maybe categorized based on their data management characteristics as follows:

Repository applications:A large amount of multimedia data as well as metadata is storedfor retrieval purposes. A central repository containing multimedia data may bemaintained by a DBMS and may be organized into a hierarchy of storage levelslocaldisks, tertiary disks and tapes, optical disks, and so on. Examples include repositories ofsatellite images, engineering drawings and designs, space photographs, and radiologyscanned pictures.


44/81


43

Presentation applications:A large number of applications involve delivery of multimediadata subject to temporal constraints. Audio and video data are delivered this way; inthese applications optimal viewing or listening conditions require the DBMS to deliverdata at certain rates offering "quality of service" above a certain threshold. Data isconsumed as it is delivered, unlike in repository applications, where it may be processedlater (e.g., multimedia electronic mail). Simple multimedia viewing of video data, for

example, requires a system to simulate VCR-like functionality. Complex and interactivemultimedia presentations involve orchestration directions to control the retrieval order ofcomponents in a series or in parallel. Interactive environments must support capabilitiessuch as real-time editing analysis or annotating of video and audio data.

Collaborative work using multimedia information: This is a new category of applications inwhich engineers may execute a complex design task by merging drawings, fittingsubjects to design constraints, and generating new documentation, change notifications,and so forth. Intelligent healthcare networks as well as telemedicine will involve doctorscollaborating among themselves, analyzing multimedia patient data and information inreal time as it is generated.

All of these application areas present major challenges for the design of multimedia databasesystems.

DATA MANAGEMENT ISSUES :

Multimedia applications dealing with thousands of images, documents, audio and videosegments, and free text data depend critically on appropriate modeling of the structure andcontent of data and then designing appropriate database schemas for storing and retrievingmultimedia information. Multimedia information systems are very complex and embrace a largeset of issues, including the following:

Modeling: This area has the potential for applying database versus information retrievaltechniques to the problem. There are problems of dealing with complex objects (see

Chapter 11) made up of a wide range of types of data: numeric, text, graphic (computer-generated image), animated graphic image, audio stream, and video sequence.Documents constitute a specialized area and deserve special consideration.

Design: The conceptual, logical, and physical design of multimedia databases has not beenaddressed fully, and it remains an area of active research. The design process can bebased on the general methodology described in Chapter 16, but the performance andtuning issues at each level are far more complex.

Storage: Storage of multimedia data on standard disklike devices presents problems ofrepresentation, compression, mapping to device hierarchies, archiving, and bufferingduring the input/output operation. Adhering to standards such as JPEG or MPEG is oneway most vendors of multimedia products are likely to deal with this issue. In DBMSs, a"BLOB" (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved.Standardized software will be required to deal with synchronization andcompression/decompression, and will be coupled with indexing problems, which are stillin the research domain.

Queries and retrieval: The "database" way of retrieving information is based on querylanguages and internal index structures. The "information retrieval" way relies strictly onkeywords or predefined index terms. For images, video data, and audio data, this opensup many issues, among them efficient query formulation, query execution, and


45/81


44

optimization. The standard optimization techniques we discussed in Chapter 18 need tobe modified to work with multimedia data types.

Performance: For multimedia applications involving only documents and text, performanceconstraints are subjectively determined by the user. For applications involving videoplayback or audio-video synchronization, physical limitations dominate. For instance,video must be delivered at a steady rate of 60 frames per second. Techniques for queryoptimization may compute expected response time before evaluating the query. The useof parallel processing of data may alleviate some problems, but such efforts are currentlysubject to further experimentation.

Such issues have given rise to a variety of open research problems. We look at a few

representative problems now.

MULTIMEDIA DATABASE APPLICATIONS :

Large-scale applications of multimedia databases can be expected to encompass a large numberof disciplines and enhance existing capabilities. Some important applications will be involved:

Documents and records management:A large number of industries and businesses keepvery detailed records and a variety of documents. The data may include engineeringdesign and manufacturing data, medical records of patients, publishing material, andinsurance claim records.

Knowledge dissemination: The multimedia mode, a very effective means of knowledgedissemination, will encompass a phenomenal growth in electronic books, catalogs,manuals, encyclopedias and repositories of information on many topics.

Education and training: Teaching materials for different audiencesfrom kindergartenstudents to equipment operators to professionalscan be designed from multimediasources. Digital libraries are expected to have a major influence on the way future studentsand researchers as well as other users will access vast repositories of educationalmaterial. (See Section 27.6 on digital libraries.)

Marketing, advertising, retailing, entertainment, and travel: There are virtually no limits tousing multimedia information in these applicationsfrom effective sales presentations tovirtual tours of cities and art galleries. The film industry has already shown the power ofspecial effects in creating animations and synthetically designed animals, aliens, andspecial effects. The use of predesigned stored objects in multimedia databases willexpand the range of these applications.

Real-time control and monitoring: Coupled with active database technology, multimediapresentation of information can be a very effective means for monitoring and controllingcomplex tasks such as manufacturing operations, nuclear power plants, patients inintensive care units, and transportation systems.

MOBILE DATABASES :

Recent advances in wireless technology have led to mobile computing, a new dimension in data

communication and processing. The mobile computing environment will provide database

applications with useful aspects of wireless technology. The mobile computing platform allows

users to establish communication with other users and to manage their work while they are

mobile. This feature is especially useful to geographically dispersed organizations. Typical


46/81


45

examples might include traffic police, taxi dispatchers, and weather reporting services, as well as

financial market reporting and information brokering applications. However, there are a number

of hardware as well as software problems that must be resolved before the capabilities of mobile

computing can be fully utilized. Some of the software problemswhich may involve data

management, transaction management, and database recoveryhave their origin in distributed

database systems. In mobile computing, however, these problems become more difficult to solve,

mainly because of the narrow bandwidth of the wireless communication channels, the relatively

short active life of the power supply (battery) of mobile units, and the changing locations of

required information (sometimes in cache, sometimes in the air, sometimes at the server). In

addition, mobile computing has its own unique architectural challenges.

The general architecture of a mobile platform is illustrated in Figure 27.04. It is a distributed

architecture where a number of computers, generally referred to as Fixed Hosts (FS) and Base

Stations (BS), are interconnected through a high-speed wired network. Fixed hosts are general

purpose computers that are not equipped to manage mobile units but can be configured to do so.

Base stations are equipped with wireless interfaces and can communicate with mobile units to

support data access.

Mobile Units (MU) (or hosts) and base stations communicate through wireless channels havingbandwidths significantly lower than those of a wired network. A downlink channel is used forsending data from a BS to an MU and an uplink channel is used for sending data from an MU to

its BS. Recent products for portable wireless have an upper limit of 1 Mbps (megabits per second)for infrared communication, 2 Mbps for radio communication, and 9.14 Kbps (kilobits per second)for cellular telephony. Ethernet, by comparison, provides 10 Mbps fast Ethernet and FDDI provide100 Mbps and ATM (asynchronous transfer mode) provides 155 Mbps.

Mobile units are battery-powered portable computers that move freely in a geographic mobility

domain, an area that is restricted by the limited bandwidth of wireless communication channels.

To manage the mobility of units, the entire geographic mobility domain is divided into smaller

domains called cells. The mobile discipline requires that the movement of mobile units be


47/81


46

unrestricted within the geographic mobility domain (intercell movement), while having information

access contiguity during movement guarantees that the movement of a mobile unit across cell

boundaries will have no effect on the data retrieval process.


48/81


47


49/81


48


50/81


49

Types of Data in Mobile Applications

Applications that run on mobile hosts have different data requirements. Users either engage inpersonal communications or office activities, or they simply receive updates on frequentlychanging information. Mobile applications can be categorized in two ways: (1) vertical applicationsand (2) horizontal applications (Note 3). In vertical applications users access data within aspecific cell, and access is denied to users outside of that cell. For example, users can obtaininformation on the location of doctors or emergency centers within a cell or parking availabilitydata at an airport cell. In horizontal applications, users cooperate on accomplishing a task, andthey can handle data distributed throughout the system. The horizontal application market ismassive; two types of applications most mentioned are mail-enabled applications and informationservices to mobile users.

Data may be classified into three categories:

1. Private data:A single user owns this data and manages it. No other user may access it.

2. Public data: This data can be used by anyone who can read it. Only one source updates it.Examples include weather bulletins or stock prices.

3. Shared data: This data is accessed both in read and write modes by groups of users.Examples include inventory data for products in a company.

Public data is primarily managed by vertical applications, while shared data is used by horizontal

applications, possibly with some replication. Copies of shared data may be stored both in base

and mobile stations. This presents a variety of difficult problems in transaction management

consistency as well as integrity and scalability of the architecture.


51/81


50

SPATIAL DATABASE :

Spatial databases provide concepts for databases that keep track of objects in a multi-

dimensional space. For example, cartographic databases that store maps include two-

dimensional spatial descriptions of their objectsfrom countries and states to rivers, cities, roads,seas, and so on. These databases are used in many applications, such as environmental,

emergency, and battle management. Other databases, such as meteorological databases for

weather information, are three-dimensional, since temperatures and other meteorological

information are related to three-dimensional spatial points. In general, a spatial database stores

objects that have spatial characteristics that describe them. The spatial relationships among the

objects are important, and they are often needed when querying the database. Although a spatial

database can in general refer to an n-dimensional space for any n, we will limit our discussion to

two dimensions as an illustration.

The main extensions that are needed for spatial databases are models that can interpret spatial

characteristics. In addition, special indexing and storage structures are often needed to improveperformance. Let us first discuss some of the model extensions for two-dimensional spatialdatabases. The basic extensions needed are to include two-dimensional geometric concepts,such as points, lines and line segments, circles, polygons, and arcs, in order to specify the spatialcharacteristics of objects. In addition, spatial operations are needed to operate on the objectsspatial characteristicsfor example, to compute the distance between two objectsas well asspatial Boolean conditionsfor example, to check whether two objects spatially overlap. Toillustrate, consider a database that is used for emergency management applications. A descriptionof the spatial positions of many types of objects would be needed. Some of these objects generally


52/81


51

have static spatial characteristics, such as streets and highways, water pumps (for fire control),police stations, fire stations, and hospitals. Other objects have dynamic spatial characteristics thatchange over time, such as police vehicles, ambulances, or fire trucks.

The following categories illustrate three typical types of spatial queries:

Range query: Finds the objects of a particular type that are within a given spatial area or

within a particular distance from a given location. (For example, finds all hospitals withinthe Dallas city area, or finds all ambulances within five miles of an accident location.)

Nearest neighbor query: Finds an object of a particular type that is closest to a givenlocation. (For example, finds the police car that is closest to a particular location.)

Spatial joins or overlays: Typically joins the objects of two types based on some spatialcondition, such as the objects intersecting or overlapping spatially or being within acertain distance of one another. (For example, finds all cities that fall on a major highwayor finds all homes that are within two miles of a lake.)

For these and other types of spatial queries to be answered efficiently, special techniques for

spatial indexing are needed. One of the best known techniques is the use of R-trees and their

variations. R-trees group together objects that are in close spatial physical proximity on the same

leaf nodes of a tree-structured index. Since a leaf node can point to only a certain number of

objects, algorithms for dividing the space into rectangular subspaces that include the objects are

needed. Typical criteria for dividing the space include minimizing the rectangle areas, since this

would lead to a quicker narrowing of the search space. Problems such as having objects with

overlapping spatial areas are handled in different ways by the many different variations of R-trees.

The internal nodes of R-trees are associated with rectangles whose area covers all the rectangles

in its subtree. Hence, R-trees can easily answer queries, such as find all objects in a given area

by limiting the tree search to those subtrees whose rectangles intersect with the area given in the

query.

Other spatial storage structures include quadtrees and their variations. Quadtrees generally

divide each space or subspace into equally sized areas, and proceed with the sub-divisions ofeach subspace to identify the positions of various objects. Recently, many newer spatial access

structures have been proposed, and this area is still an active research area.

CLUSTERING BASED DISASTER PROOF DATABASES :

If downtime is not an option, and the Web never closes for business, how do you keep

your company's doors open 24/7? The answer lies in high-availability (HA) systems that

approach 100 percent uptime.

The principles of high availability define a level of backup and recovery. Until recently, highavailability simply meant hardware or software recovery via RAID (Redundant Array of

Independent Disks). RAID addressed the need for fault tolerance in data but didn't solve the

problem of a complete DBMS failure.
http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259')http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259')http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259')


53/81


52

For even more uptime, database administrators are turning to clustering as the best way to

achieve high availability. Recent moves by Oracle, with its Real Application Cluster, and Microsoft,

with MCS (Microsoft Cluster Service) have made multinode clusters for HA in production

environments mainstream.

In a high-availability setup, a cluster functions by associating servers that have the ability to share

a disk group. As illustrated here, each node has fail-over node within its cluster. If a failure occurs

in Node 1, Node 2 picks up the slack by assuming the resources and the unique logic and

transaction functions of the failed DBMS.

Clustering can have the added benefit of not being bound by node colocation. Fiber-optic

connections, which can be cabled for miles between the nodes in a cluster, ensure continued

operation even in the face of a complete meltdown of your primary system.

When a hot-standby model is in place, downtimes may be less than a minute. This is especially

important if your service-level agreement requires higher than 99.9 percent uptime, which

translates to only 8.7 hours of downtime per year.

Clustering technologies are pricey, however. The enterprise software and hardware must be

uniform and compatible with the clustering technology to work properly. There's also the

associated overhead in the design and maintenance of redundant systems.

One cost-effective solution is log shipping, in which a database can synchronize physically distinctdatabases by sending transactions logs from one server to another. In the event of a failure, the

logs can be used to reinstate the settings up to the point of the failure. Other methods include

snapshot databases and replication technologies such as Sybase's Replication Server, which has

been around for decades.


54/81


53

High-availability add-ons to databases are useful but should be understood in the context of a

complete HA methodology. This requires a concerted effort toward standardization on each of

your mission-critical infrastructures. Fault-tolerant application design with hands-off exception

handling, self-healing and redundant networks, and a stable operating system are all prerequisites

for high availability.

When you adhere to these standards, enforceable, database-specific HA technologies are sure

to lead your enterprise on the path to minimum downtime.

SOME QUESTIONS

Q .1 How a distributed database can be recovered in case of failure ?

Ans :


55/81


54


56/81


55


57/81


56


58/81


57

In a distributed setting, the server must log a write operation not only to the local log file, butalso to 1, 2 or more remote logs. The issue is close to replication methods, the main choice beingto adopt either a synchronousor asynchronous protocol.

Synchronous protocol.The server acknowledges the Client only when all the remote nodes have sent aconfirmation of the successful completion of their write() operation. In practice, the Clientwaits until the slower of all the writers sends its acknowledgment. This may severely hinderthe efficiency of updates, but the obvious advantage is that all the replicas are consistent.


59/81


58

Asynchronous protocol.The Client application waits only until one of the copies (the fastest) has been effectivelywritten. Clearly, this puts a risk on data consistency, as a subsequent read operation mayaccess an older version that does not yet reflect the update.

Q. 2 What is a multimedia database? explain the methods of mining multimediadatabase.

Ans : Multimedia database : Explained above.


60/81


59


61/81


60

The methods of mining multimedia database :


62/81


61


63/81


62


64/81


63


65/81


64

Q. 3 Write short notes on any four of the following :(1) Web database(2) Mobile databases

Ans : Explained above.

Q. 4 Design issues of distributed databases.Ans :


66/81


65


67/81


66


68/81


67


69/81


68


70/81


69


71/81


70

Q. 5 What is commit protocol and why is it required in a distributed database?Describes and compare. Two phase and three phase commit. What is blocking andhow does the three phase protocol prevent it? Explain Distributed transaction.

Ans :


72/81


71

Distributed transaction :


73/81


72

Commit Protocol :

why commit protocol is required in a distributed database : because of system failure

and to provide atomicity across sites.


74/81


73


75/81


74


76/81


75


77/81


76

BLOCKING PROBLEM :


78/81


77


79/81


78

Q. 6 What are web databases ? How databases are accessed through web ?

Ans : Web databases : Explained above.

Providing Access to Databases on the World Wide Web

Todays technology has been moving rapidly from static to dynamic Web pages, where contentmay be in a constant state of flux. The Web server uses a standard interface called the CommonGateway Interface (CGI) to act as the middlewarethe additional software layer between theuser interface front-end and the DBMS back-end that facilitates access to heterogeneousdatabases. The CGI middleware executes external programs or scripts to obtain the dynamicinformation, and it returns the information to the server in HTML, which is given back to thebrowser.

As the Web undergoes its latest transformations, it has become necessary to allow users accessnot only to file systems but to databases and DBMSs to support query processing, report

generation, and so forth. The existing approaches may be divided into two categories:1.Access using CGI scripts: The database server can be made to interact with the Web server

via CGI. Figure 27.01 shows a schematic for the database access architecture on the Webusing CGI scripts, which are written in languages like PERL, Tcl, or C. The maindisadvantage of this approach is that for each user request, the Web server must start anew CGI process: each process makes a new connection with the DBMS and the Webserver must wait until the results are delivered to it. No efficiency is achieved by anygrouping of multiple users requests; moreover, the developer must keep the scripts in theCGI-bin subdirectories only, which opens it to a possible breach of security. The fact that


80/81


79

CGI has no language associated with it but requires database developers to learn PERLor Tcl is also a drawback. Manageability of scripts is another problem if the scripts arescattered everywhere.

2. Access using JDBC: JDBC is a set of Java classes developed by Sun Microsystems toallow access to relational databases through the execution of SQL statements. It is a wayof connecting with databases, without any additional processes for each client request.Note that JDBC is a name trademarked by Sun; it does not stand for Java Data Baseconnectivity as many believe. JDBC has the capabilities to connect to a database, sendSQL statements to a database and to retrieve the results of a query using the Java classesConnection, Statement, and Result Set respectively. With Javas claimed platformindependence, an application may run on any Java-capable browser, which loads the Javacode from the server and runs it on the clients browser. The Java code is DBMS

transparent; the JDBC drivers for individual DBMSs on the server end carry the task ofinteracting with that DBMS. If the JDBC driver is on the client, the application runs on theclient and its requests are communicated to the DBMS directly by the driver. For standardSQL requests, many RDBMSs can be accessed this way. The drawback of using JDBCis the prospect of executing Java through virtual machines with inherent efficiency. TheJDBC bridge to Object Database Connectivity (ODBC) remains another way of getting tothe RDBMSs.


81/81

80

Besides CGI, other Web server vendors are launching their own middleware products for

providing multiple database connectivity. These include Internet Server API (ISAPI) from

Microsoft and Netscape API (NSAPI) from Netscape. In the next section we describe the Web

access option provided by Informix. Other DBMS vendors already have, or will have similar

provisions to support database access on the Web.

Q. 7 Compare the relative merits of centralized and hierarchical deadlockdetection in a distributed DBMS.

Ans :

A centralized deadlock detection scheme is a reasonable choice if the concurrent controlalgorithm is also centralized.It is better for distributed access patterns across sites since deadlocks occurring between any canbe immediately identified. However, this benefit comes at the expense of communicationsbetween the central location and every other site.

A hierarchical deadlock detectionscheme releases the burden of one single site for deadlockdetection, and let more sites get involved.When access patterns are more localized, perhaps by geographic area, they may likely occuramong certain sites with frequent communications. The hierarchical approach is more efficient inthat it checks for deadlocks where they most likely happen, the hierarchical scheme splitsdeadlock detection efforts, thus resulting in greater efficiency.