advance concept in data bases unit-3 by arun pratap singh
Post on 08-Feb-2018
235 Views
Preview:
TRANSCRIPT
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
1/81
PREPARED BY ARUN PRATAP SINGH MTECH2nd SEMESTER
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
2/81
PREPARED BY ARUN PRATAP SINGH 1
1
DISTRIBUTED DATABASES INTRODUCTION :
o A distributed database (DDB) is a collection of multiple, logically interrelateddatabases distributed over a computer network.
o A distributed database management system (DDBMS) is the software that managesthe DDB and provides an access mechanism that makes this distributiontransparent to the users.
A distributed database is adatabase in whichstorage devices are not all attached to a common
processing unit such as theCPU,controlled by a distributeddatabase management system (together
sometimes called a distributed database system). It may be stored in multiplecomputers,located in
the same physical location; or may be dispersed over anetwork of interconnected computers. Unlike
parallel systems, in which the processors are tightly coupled and constitute a single database system,
a distributed database system consists of loosely-coupled sites that share no physical components.
System administrators can distribute collections of data (e.g. in a database) across multiple physical
locations. A distributed database can reside on network servers on the Internet, oncorporate intranets or extranets, or on other company networks. Because they store data across
multiple computers, distributed databases can improve performance atend-user worksites by allowing
transactions to be processed on many machines, instead of being limited to one.[2]
Two processes ensure that the distributed databases remain up-to-date and
current:replication andduplication.
UNIT : III
http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Computer_storagehttp://en.wikipedia.org/wiki/CPUhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Computershttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Network_servershttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/Intranetshttp://en.wikipedia.org/wiki/Extranetshttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/End-userhttp://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/Replication_(computing)http://en.wikipedia.org/wiki/Duplicationhttp://en.wikipedia.org/wiki/Duplicationhttp://en.wikipedia.org/wiki/Replication_(computing)http://en.wikipedia.org/wiki/Distributed_database#cite_note-obrien-2http://en.wikipedia.org/wiki/End-userhttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Extranetshttp://en.wikipedia.org/wiki/Intranetshttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/Network_servershttp://en.wikipedia.org/wiki/Computer_networkhttp://en.wikipedia.org/wiki/Computershttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/CPUhttp://en.wikipedia.org/wiki/Computer_storagehttp://en.wikipedia.org/wiki/Database -
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
3/81
PREPARED BY ARUN PRATAP SINGH 2
2
1. Replication involves using specialized software that looks for changes in the distributive
database. Once the changes have been identified, the replication process makes all the
databases look the same. The replication process can be complex and time-consuming
depending on the size and number of the distributed databases. This process can also require
a lot of time and computer resources.
2. Duplication, on the other hand, has less complexity. It basically identifies one database as
amaster and then duplicates that database. The duplication process is normally done at a set
time after hours. This is to ensure that each distributed location has the same data. In the
duplication process, users may change only the master database. This ensures that local data
will not be overwritten.
A database user accesses the distributed database through:
Local applications
-applications which do not require data from other sites.
Global applications
-applications which do require data from other sites.
A homogeneous distributed database has identical software and hardware running all
databases instances, and may appear through a single interface as if it were a single
database. A heterogeneous distributed databasemay have different hardware, operating
systems, database management systems, and even data models for different databases.
A DDBMS mainly classified into two types:
Homogeneous Distributed database management systems
Heterogeneous Distributed database management systems
Homogeneous DDBMS :-
In a homogeneous distributed database all sites have identical software and are awareof each other and agree to cooperate in processing user requests.
The homogeneous system is much easier to design and manage The operating system used, at each location must be same or compatible. The database application (or DBMS) used at each location must be same or compatible.
In a homogeneous distributed database all sites have identical software and are aware of each other
and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms
of right to change schema or software. A homogeneous DDBMS appears to the user as a single
http://en.wikipedia.org/wiki/Master-slave_(technology)http://en.wikipedia.org/wiki/Master-slave_(technology) -
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
4/81
PREPARED BY ARUN PRATAP SINGH 3
3
system. The homogeneous system is much easier to design and manage. The following conditions
must be satisfied for homogeneous database:
The operating system used, at each location must be same or compatible
The data structures used at each location must be same or compatible.
The database application (or DBMS) used at each location must be same or compatible.
Heterogeneous DDBMS :-
In a heterogeneous distributed database different sites may use different schema andsoftware.
In heterogeneous systems, different nodes may have different hardware & software anddata structures at various nodes or locations are also incompatible.
Different computers and operating systems, database applications or data models maybe used at each of the locations.
In a heterogeneous distributed database, different sites may use different schema and software.
Difference in schema is a major problem for query processing and transaction processing. Sites may
not be aware of each other and may provide only limited facilities for cooperation in transaction
processing. In heterogeneous systems, different nodes may have different hardware & software and
data structures at various nodes or locations are also incompatible. Different computers and operating
systems, database applications or data models may be used at each of the locations. For example,
one location may have the latest relational database management technology, while another location
may store data using conventional files or old version of database management system. Similarly, one
location may have the Windows NT operating system, while another may have UNIX. Heterogeneous
systems are usually used when individual sites use their own hardware and software. On
heterogeneous system, translations are required to allow communication between different sites (or
DBMS). In this system, the users must be able to make requests in a database language at their local
sites. Usually the SQL database language is used for this purpose. If the hardware is different, then
the translation is straightforward, in which computer codes and word-length is changed. The
heterogeneous system is often not technically or economically feasible. In this system, a user at one
location may be able to read but not update the data at another location.
Advantages :
Increase reliability and availability
Easier expansion
Reliable transactions - due to replication of the database
Hardware, operating-system, network, fragmentation, DBMS, replication and locationindependence
Economics it may cost less to create a network of smaller computers with the power of a
single large computer
Disadvantages :
Additional software is required
Operating system should support distributed environment
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
5/81
PREPARED BY ARUN PRATAP SINGH 4
4
Concurrency control poses a major issue. It can be solved bylocking andtimestamping.
Distributed access to data
Analysis of distributed data
DISTRIBUTED DATABASE ARCHITECTURE :
A distributed database systemallows applications to access data from local and remote databases. Ina homogenous distributed database system, each database is an Oracle Database. Ina heterogeneous distributed database system, at least one of the databases is not an Oracle Database.Distributed databases use a client/serverarchitecture to process information requests.
http://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Lock_(database)http://en.wikipedia.org/wiki/Timestamphttp://en.wikipedia.org/wiki/Timestamphttp://en.wikipedia.org/wiki/Lock_(database)http://en.wikipedia.org/wiki/Concurrency_control -
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
6/81
PREPARED BY ARUN PRATAP SINGH 5
5
Homogenous Distributed Database Systems :-A homogenous distributed database system is a network of two or more Oracle Databases that reside onone or more machines. Figure 29-1 illustrates a distributed system that connects three databases: hq, mfg,
and sales. An application can simultaneously access or modify the data in several databases in a singledistributed environment. For example, a single query from a Manufacturing client on local database mfgcan retrieve joined data from the products table on the local database and the dept table on the remote hqdatabase.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
7/81
PREPARED BY ARUN PRATAP SINGH 6
6
Heterogeneous Distributed Database Systems :-
In a heterogeneous distributed database system, at least one of the databases is a non-Oracle Databasesystem. To the application, the heterogeneous distributed database system appears as a single, local,Oracle Database. The local Oracle Database server hides the distribution and heterogeneity of the data.
The Oracle Database server accesses the non-Oracle Database system using Oracle HeterogeneousServices in conjunction with an agent. If you access the non-Oracle Database data store using an OracleTransparent Gateway, then the agent is a system-specific application. For example, if you include a Sybasedatabase in an Oracle Database distributed system, then you need to obtain a Sybase-specific transparentgateway so that the Oracle Database in the system can communicate with it.
Client/Server Database Architecture :-
A database server is the Oracle software managing a database, and a client is an application that requestsinformation from a server. Each computer in a network is a node that can host one or more databases.Each node in a distributed database system can act as a client, a server, or both, depending on the situation.
In Figure 29-2, the host for the hq database is acting as a database server when a statement is issuedagainst its local data (for example, the second statement in each transaction issues a statement againstthe local dept table), but is acting as a client when it issues a statement against remote data (for example,the first statement in each transaction is issued against the remote table emp in the sales database).
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
8/81
PREPARED BY ARUN PRATAP SINGH 7
7
DISTRIBUTED DATABASE SYSTEM DESIGN :
In a distributed system, data are physically distributed among several sites but it provides aview of single logical database to its users. Each node of a distributed database system may
follow the three-tier architecture like the centralized database management system (DBMS).Thus, the design of a distributed database system involves the design of a global conceptualschema, in addition to the local schemas, which conform to the three-tier architecture of the
DBMS in each site. The design of computer network across the sites of a distributed system
adds extra complexity to the design issue. The crucial design issue involves the distributionof data among the sites of the distributed system. Therefore, the design and implementationof the distributed database system is a very complicated task and it involves three important
factors as listed in the following.
Fragmentation
A global relation may be divided into several non-overlappingsubrelations called fragments, which are then distributed among sites.
AllocationAllocation involves the issue of allocating fragments among sites in a
distributed system. Each fragment is stored at the site with optimal distribution.
ReplicationThe distributed database system may maintain several copies of afragment at different sites.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
9/81
PREPARED BY ARUN PRATAP SINGH 8
8
Design Strategies:-
In this process, the database design starts from the global schema design and proceeds bydesigning the fragmentation of the database, and then by allocating the fragments to the differentsites, creating the physical images. The process is completed by performing the physical design
of the data at each site, which is allocated to it. The global schema design involves both designingof global conceptual schema and global external schemas (view design). In global conceptualschema designing step, the user needs to specify the data entities and to determine theapplications that will run on the database as well as statistical information about theseapplications. At this stage, the design of local conceptual schemas is considered. The objectiveof this step is to design local conceptual schemas by distributing the entities over the sites of thedistributed system. Rather than distributing relations, it is quite common to partition relations intosubrelations, which are then distributed to different sites. Thus, in a top-down approach, thedistributed database design involves two phases, namely, fragmentation and allocation.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
10/81
PREPARED BY ARUN PRATAP SINGH 9
9
The fragmentation phase is the process of clustering information in fragments that can beaccessed simultaneously by different applications, whereas the allocation phase is the processof distributing the generated fragments among the sites of a distributed database system. In thetop-down design process, the last step is the physical database design, which maps the localconceptual schemas into physical storage devices available at corresponding sites. Top-down
design process is the best suitable for those distributed systems that are developed from scratch.
In the bottom-up design process, the issue of integration of several existing local schemas into aglobal conceptual schema is considered to develop a distributed system. When several existingdatabases are aggregated to develop a distributed system, the bottom-up design process isfollowed. This process is based on the integration of several existing schemas into a single globalschema. It is also possible to aggregate several existing heterogeneous systems for constructinga distributed database system using the bottom-up approach. Thus, the bottom-up design processrequires the following steps:
The selection of a common database model for describing the global schema of thedatabase
The translation of each local schema into the common data model The integration of the local schemas into a common global schema.
Any one of the above design strategies is followed to develop a distributed database system.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
11/81
PREPARED BY ARUN PRATAP SINGH 10
10
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
12/81
PREPARED BY ARUN PRATAP SINGH 11
11
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
13/81
PREPARED BY ARUN PRATAP SINGH 12
12
DISTRIBUTED QUERY PROCESSING :
Query Processing Basics
centralized query processing
distributed query processing
The retrieval of data from different sites in a network is known as distributed queryprocessing.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
14/81
PREPARED BY ARUN PRATAP SINGH 13
13
Step 1 Query Decomposition :-o Normalization
o Analysiso Simplificationo Restructuring
Step 2 Data Localization
Step 3 Global Query Optimization
Step 4 Local Optimization
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
15/81
PREPARED BY ARUN PRATAP SINGH 14
14
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
16/81
PREPARED BY ARUN PRATAP SINGH 15
15
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
17/81
PREPARED BY ARUN PRATAP SINGH 16
16
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
18/81
PREPARED BY ARUN PRATAP SINGH 17
17
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
19/81
PREPARED BY ARUN PRATAP SINGH 18
18
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
20/81
PREPARED BY ARUN PRATAP SINGH 19
19
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
21/81
PREPARED BY ARUN PRATAP SINGH 20
20
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
22/81
PREPARED BY ARUN PRATAP SINGH 21
21
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
23/81
PREPARED BY ARUN PRATAP SINGH 22
22
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
24/81
PREPARED BY ARUN PRATAP SINGH 23
23
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
25/81
PREPARED BY ARUN PRATAP SINGH 24
24
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
26/81
PREPARED BY ARUN PRATAP SINGH 25
25
PHFPRIMARY HORIZONTAL FRAGMENTATION
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
27/81
PREPARED BY ARUN PRATAP SINGH 26
26
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
28/81
PREPARED BY ARUN PRATAP SINGH 27
27
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
29/81
PREPARED BY ARUN PRATAP SINGH 28
28
VFVERTICAL FRAGMENTATION
DHFDERIVED HORIZONTAL FRAGMENTATION
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
30/81
PREPARED BY ARUN PRATAP SINGH 29
29
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
31/81
PREPARED BY ARUN PRATAP SINGH 30
30
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
32/81
PREPARED BY ARUN PRATAP SINGH 31
31
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
33/81
PREPARED BY ARUN PRATAP SINGH 32
32
CONCURRENCY CONTROL IN DISTRIBUTED DATABASE :
Concurrency Control: In distributed database systems, database is typically used by many
users. These systems usually allow multiple transactions to run concurrently i.e. at the same time.
Concurrency control is the activity of coordinating concurrent accesses to a database in a
multiuser database management system (DBMS). Concurrency control permits users to access
a database in a multi-programmed fashion while preserving the illusion that each user is executingalone on a dedicated system. The main technical difficulty in attaining this goal is to prevent
database updates performed by one user from interfering with database retrievals and updates
performed by another. When the transactions are updating data concurrently, it may lead to
several problems with the consistency of the data.
Distributed Concurrency Control Algorithms:In this paper, we consider some of the distributed concurrency control algorithms. We summarizethe salient aspects of these four algorithms in this section. In order to do this, we must first explainthe structure that we have assumed for distributed transactions. Before discussing the algorithms,we need to get an idea about the distributed transactions. Distributed Transaction:A distributed
transaction is a transaction that runs in multiple processes, usually on several machines. Eachprocess works for the transaction. Distributed transaction processing systems are designed tofacilitate transactions that span heterogeneous, transaction-aware resource managers in adistributed environment. The execution of a distributed transaction requires coordination betweena global transaction management system and all the local resource managers of all the involvedsystems. The resource manager and transaction processing monitor are the two primary elementsof any distributed transactional system. Distributed transactions, like local transactions, mustobserve the ACID properties. However, maintenance of these properties is very complicated fordistributed transactions because a failure can occur in any process. If such a failure occurs, eachprocess must undo any work that has already been done on behalf of the transaction. A distributedtransaction processing system maintains the ACID properties in distributed transactions by usingtwo features:
overable processes log their actions and therefore can restoreearlier states if a failure occurs.
or aborting of a transaction. The most common commit protocol is the two-phase commit protocol.
Distributed Two-Phase Locking (2PL):
In order to ensure serializability of parallel executed transactions elaborated different methods of
concurrency control. One of these methods is locking method. There are different forms of locking
method. Two phase locking protocol is one of the basic concurrency control protocols in
distributed database systems. The main approach of this protocol is read any, write all.
Transactions set read locks on items that they read, and they convert their read locks to write
locks on items that need to be updated. To read an item, it suffices to set a read lock on any copyof the item, so the local copy is locked; to update an item, write locks are required on all copies.
Write locks are obtained as the transaction executes, with the transaction blocking on a write
request until all of the copies of the item to be updated have been successfully locked. All locks
are held until the transaction has successfully committed or aborted [2]. The 2PL Protocol
oversees locks by determining when transactions can acquire and release locks. The 2PL
protocol forces each transaction to make a lock or unlock request in two steps:
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
34/81
PREPARED BY ARUN PRATAP SINGH 33
33
The transaction first enters into the Growing Phase, makes requests for required locks, then gets
into the Shrinking phase where it releases all locks and cannot make any more requests.
Transactions in 2PL Protocol should get all needed locks before getting into the unlock phase.
While the 2PL protocol guarantees serializability, it does not ensure that deadlocks do not happen.
So deadlock is a possibility in this algorithm, Local deadlocks are checked for any time a
transaction blocks, and are resolved when necessary by restarting the transaction with the most
recent initial startup time among those involved in the deadlock cycle. Global deadlock detection
is handled by a Snoop process, which periodically requests waits-for information from all sites
and then checks for and resolves any global deadlocks.
Wound-Wait (WW):The second algorithm is the distributed wound-wait locking algorithm. It follows the same
approach as the 2 PL protocol. The difference lies in the fact that it differs from 2PL in its handling
of the deadlock problem: unlike 2PL protocol, rather than maintaining waits-for information and
then checking for local and global deadlocks, deadlocks are prevented via the use of timestampsin this algorithm. Each transaction is numbered according to its initial startup time, and younger
transactions are prevented from making older ones wait. If an older transaction requests a lock,
and if the request would lead to the older transaction waiting for a younger transaction, the
younger transaction is wounded it is restarted unless it is already in the second phase of its
commit protocol. Younger transactions can wait for older transactions so that the possibility of
deadlocks is eliminated [2].
t(T1) > t(T2) -: If requesting transaction [t(T1)] is younger than the transaction [t(T2)] that has
holds lock on requested data item then requesting transaction [t(T1)] has to wait. t(T1) < t(T2) -:
If requesting transaction [t(T1)] is older than the transaction [t(T2)] that has holds lock on
requested data item then requesting transaction [t(T1)] has to abort or rollback.
Basic Timestamp Ordering (BTO):A timestamp is a unique identifier created by the DBMS to identify a transaction. Typically,timestamp values are assigned in the order in which the transactions are submitted to the system,so a timestamp can be thought of as the transaction start time. The third algorithm is the basictimestamp ordering algorithm. The idea for this scheme is to order the transactions based on theirtimestamps. A schedule in which the transactions participate is then serializable, and theequivalent serial schedule has the transactions in order of their timestamp values. This is called
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
35/81
PREPARED BY ARUN PRATAP SINGH 34
34
timestamp ordering (TO). Like wound-wait, it employs transaction startup timestamps, but it usesthem differently. BTO associates timestamps with all recently accessed data items and requiresthat conflicting data accesses by transactions be performed in timestamp order instead of usinglocking approach. Transactions that attempt to perform out-of-order accesses are restarted. Whena read request is received for an item, it is permitted if the timestamp of the requester exceedsthe items write timestamp. When a write request is received, it is permitted if the requesters
timestamp exceeds the read timestamp of the item; in the event that the timestamp of therequester is less than the write timestamp of the item, the update is simply ignored [2]. Forreplicated data, the read any, write all approach is used, so a read request may be sent to anycopy while a write request must be sent to all copies. Integration of the algorithm with two phasecommit is accomplished as follows: Writers keep their updates in a private workspace until committime.
Distributed Optimistic(OPT):
The fourth algorithm is the distributed, timestamp-based, optimistic concurrency control algorithm.
which operates by exchanging certification information during the commit protocol. For each data
item, a read timestamp and a write timestamp are maintained. Transactions may read and update
data items freely, storing any updates into a local workspace until commit time. For each read,
the transaction must remember the version identifier (i.e., write timestamp) associated with the
item when it was read. Then, when all of the transactions cohorts have completed their work, and
have reported back to the master, the transaction is assigned a globally unique timestamp. This
time stamp is sent to each cohort in the prepare to commit message ,and it is used to locally
certify all of its reads and writes as follows [2]:
A read request is certified if-:
(i) The version that was read is still the current version of the item, and
(ii) No write with a newer timestamp has already been locally certified.
A write request is certified if-:
(i) No later reads have been certified and subsequently committed, and
(ii) No later reads have been locally certified already [2].
Concurrency control is the activity of coordinating concurrent accesses to a database ina multi-user database management system (DBMS)
Several problems1. The lost update problem.2. The temporary update problem3. The incorrect summary problem
As an example, consider an on-line airline reservation system. Suppose two customers Customer
A and Customer B, simultaneously try to reserve a seat for the same flight. In the absence of
concurrency control, these two activities could interfere as illustrated in Figure 1. Let Seat No 18
be the first available seat. Both transactions could read the reservation information approximately
same time and they reserve the seat No 18 for Customer A and Customer B, and store the result
back into the database. The net effect is incorrect: Although two customers reserved a seat, the
database reflects only one activity, the other reservation is lost by the system.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
36/81
PREPARED BY ARUN PRATAP SINGH 35
35
RECOVERY CONTROL IN DISTRIBUTED DATABASES :
As with local recovery, distributed database recovery aims to maintain the atomicity and durabilityof distributed transactions. A database must guarantee that all statements in a transaction,distributed or non-distributed, either commit or roll back as a unit. The effects of an ongoingtransaction should be invisible to all other transactions at all sites. This transparency should be
true for transactions that include any type of operations, including queries, updates or remoteprocedure calls. In a distributed database environment also the database management systemmust coordinate transaction control with these characteristics over a communication network andmaintain data consistency, even if network or system failure occurs.
In DDBMS, a given transaction is submitted at some one site, but it can access data at other sitesas well. When a transaction is submitted at some one site, the transaction manager at that sitebreaks it up into a collection of one or more sub-transactions that execute at different sites. Thetransaction manager then submits these sub-transactions to the transaction managers at theother sites and coordinates their activities. To ensure the atomicity of the global transaction, theDDBMS must ensure that sub-transactions of the global transaction either all commit or all abort.
Recovery Control in distributed database is based on the two-phase commit protocol. The two
phase commit protocol is the transaction protocol duo to which all nodes and databases agree
with each other to commit a transaction. This protocol is required in an environment where single
transaction can interact with multiple independent resource managers as in case of distributed
databases. It also support data integrity by ensuring that modifications made to transactions are
either committed by all the databases involved in the distributed system or rolled back by all the
databases.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
37/81
PREPARED BY ARUN PRATAP SINGH 36
36
The two phases commit protocol works in two phases. The first phase is called the prepare phase
during which the updates are recorded in a transaction log file, and the resource through a
resource manager indicates that it is ready to make the changes. Resources can vote either to
commit phase depend on the vote of resources. If all resources vote to commit then, all the
resources participating in the transaction are updated whereas if one or more of the resources
vote to roll back, then, all the resources are rolled back to their previous state.
Consider an example, in which an interaction between a coordinator at a local site and a
participant at a remote site takes place and a transaction has requested the commit operation. In
the first phase, the coordinator instructs the participants to get ready and sends the get ready
message at time. Participants make an entry in it log and send the ok message as
acknowledgement to the coordinator. The coordinator then, writes an entry in the log, takes a final
decision and sends it to the participants.
Prepare Phase
Coordinator receives a commit request
Coordinator instructs all resource managers to get ready to go either way on the
transaction. Each resource manager writes all updates from that transaction to its
own physical log
Coordinator receives replies from all resource managers. If all are ok, it writes
commit to its own log; if not then it writes rollback to its log
Commit Phase
Coordinator then informs each resource manager of its decision and broadcasts a
message to either commit or rollback (abort). If the message is commit, then eachresource manager transfers the update from its log to its database
A failure during the commit phase puts a transaction in limbo. This has to be
tested for and handled with timeouts or polling
WEB DATABASES :
The World Wide Web (WWW)popularly known as "the Web"originally developed inSwitzerland at CERN (Note 1) in early 1990 as a large-scale hypermedia information servicesystem for biological scientists to share information (Note 2). Today this technology allows
universal access to this shared information to anyone having access to the Internet and the Webcontains hundreds of millions of Web pages within the reach of millions of users.
In Web technology, a basic client-server architecture underlies all activities. Information is storedon computers designated as Web servers in publicly accessible shared files encoded usingHyperText Markup Language (HTML). A number of tools enable users to create Web pagesformatted with HTML tags, freely mixed with multimedia contentfrom graphics to audio andeven to video. A page has many interspersed hyperlinksliterally a link that enables a user to"browse" or move from one page to another across the Internet. This ability has given a
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
38/81
PREPARED BY ARUN PRATAP SINGH 37
37
tremendous power to end users in searching and navigating related informationoften acrossdifferent continents.
Information on the Web is organized according to a Uniform Resource Locator (URL)something similar to an address that provides the complete pathname of a file. The pathnameconsists of a string of machine and directory names separated by slashes and ends in a filename.For example, the table of contents of this book is currently at the following URL:
http://cseng.aw.com/book/0,,0805317554,00.html
A URL always begins with a hypertext transport protocol (http), which is the protocol used by
the Web browsers, a program that communicates with the Web server, and vice versa. Web
browsers interpret and present HTML documents to users. Popular Web browsers include the
Internet Explorer of Microsoft and the Netscape Navigator. A collection of HTML documents and
other files accessible via the URL on a Web server is called a Web site. In the above URL,
"www.awl.com" may be called the Web site of Addison Wesley Publishing.
Providing Access to Databases on the World Wide Web
Todays technology has been moving rapidly from static to dynamic Web pages, where content
may be in a constant state of flux. The Web server uses a standard interface called the CommonGateway Interface (CGI) to act as the middlewarethe additional software layer between theuser interface front-end and the DBMS back-end that facilitates access to heterogeneousdatabases. The CGI middleware executes external programs or scripts to obtain the dynamicinformation, and it returns the information to the server in HTML, which is given back to thebrowser.
As the Web undergoes its latest transformations, it has become necessary to allow users accessnot only to file systems but to databases and DBMSs to support query processing, reportgeneration, and so forth. The existing approaches may be divided into two categories:
1.Access using CGI scripts: The database server can be made to interact with the Web servervia CGI. Figure 27.01 shows a schematic for the database access architecture on the Web
using CGI scripts, which are written in languages like PERL, Tcl, or C. The maindisadvantage of this approach is that for each user request, the Web server must start anew CGI process: each process makes a new connection with the DBMS and the Webserver must wait until the results are delivered to it. No efficiency is achieved by anygrouping of multiple users requests; moreover, the developer must keep the scripts in theCGI-bin subdirectories only, which opens it to a possible breach of security. The fact thatCGI has no language associated with it but requires database developers to learn PERLor Tcl is also a drawback. Manageability of scripts is another problem if the scripts arescattered everywhere.
http://cseng.aw.com/book/0,,0805317554,00.htmlhttp://cseng.aw.com/book/0,,0805317554,00.html -
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
39/81
PREPARED BY ARUN PRATAP SINGH 38
38
2. Access using JDBC: JDBC is a set of Java classes developed by Sun Microsystems toallow access to relational databases through the execution of SQL statements. It is a wayof connecting with databases, without any additional processes for each client request.Note that JDBC is a name trademarked by Sun; it does not stand for Java Data Baseconnectivity as many believe. JDBC has the capabilities to connect to a database, sendSQL statements to a database and to retrieve the results of a query using the Java classesConnection, Statement, and Result Set respectively. With Javas claimed platformindependence, an application may run on any Java-capable browser, which loads the Javacode from the server and runs it on the clients browser. The Java code is DBMStransparent; the JDBC drivers for individual DBMSs on the server end carry the task ofinteracting with that DBMS. If the JDBC driver is on the client, the application runs on theclient and its requests are communicated to the DBMS directly by the driver. For standardSQL requests, many RDBMSs can be accessed this way. The drawback of using JDBCis the prospect of executing Java through virtual machines with inherent efficiency. TheJDBC bridge to Object Database Connectivity (ODBC) remains another way of getting tothe RDBMSs.
Besides CGI, other Web server vendors are launching their own middleware products for
providing multiple database connectivity. These include Internet Server API (ISAPI) from
Microsoft and Netscape API (NSAPI) from Netscape. In the next section we describe the Web
access option provided by Informix. Other DBMS vendors already have, or will have similar
provisions to support database access on the Web.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
40/81
PREPARED BY ARUN PRATAP SINGH 39
39
THE WEB INTEGRATION OPTION OF INFORMIX :
Informix has addressed the limitations of CGI and the incompatibilities of CGI, NSAPI, and ISAPI
by creating the Web Integration Option (WIO). WIO eliminates the need for scripts. Developers
use tools to create intelligent HTML pages called Application Pages (or App Pages) directly within
the database. They execute SQL statements dynamically, format the results inside HTML, and
return the resulting Web page to the end users. The schematic architecture is shown in Figure27.02. WIO uses the Web Driver, a lightweight CGI process that is invoked when a URL request
is received by the Web server. A unique session identifier is generated for each request but the
WIO application is persistent and does not terminate after each request.
When the WIO application receives a request from the Web driver, it connects to the database
and executes Web Explode, a function that executes queries within Web pages and formats
results as a Web page that goes back to the browser via the Web driver.
Informix HTML tag extensions allow Web authors to create applications that can dynamicallyconstruct Web page templates from the Informix Dynamic Server and present them to the endusers. WIO also lets users create their own customized tags to perform specialized tasks. Thus,without resorting to any programming or script development, powerful applications can bedesigned. Another feature of WIO helps transaction-oriented applications by providing anapplication programming interface (API) that offers a collection of basic services such asconnection and session management that can be incorporated into Web application.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
41/81
PREPARED BY ARUN PRATAP SINGH 40
40
WIO supports applications developed in C, C++, and Java. This flexibility lets developers port
existing applications to the Web or develop new applications in these languages. The WIO is
integrated with Web server software and utilizes the native security mechanism of the Informix
Dynamic Server. The open architecture of WIO allows the use of various Web browsers and
servers.
THE ORACLE WEBSERVER :
ORACLE supports Web access to databases using the components shown in Figure 27.03. The
client requests files that are called "static" or "dynamic" files from the Web server. Static files have
a fixed content whereas dynamic files may have content that includes results of queries to the
database.There is an HTTP demon (a process that runs continuously) called Web Listener
running on the server that listens for the requests originating in the clients. A static file (document)
is retrieved from the file system of the server and displayed on the Web browser at the client.
Request for a dynamic page is passed by the listener to a Web request broker (WRB), which is a
multi-threaded dispatcher that adheres to cartridges. Cartridges are software modules
(mentioned earlier in Section 13.2.6) that perform specific functions on specific types of data; theycan communicate among themselves. Currently cartridges are provided for PL/SQL, Java, and
Live HTML; customized cartridges may be provided as well.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
42/81
PREPARED BY ARUN PRATAP SINGH 41
41
OPEN PROBLEMS WITH WEB DATABASES :
The Web is an important factor in planning for enterprise-wide computing environments, both forproviding external access to the enterprises systems and information for customers and suppliersand for marketing and advertising purposes. At the same time, due to security requirements,employees of some organizations are restricted to operate within intranetssubnetworks thatcannot be accessed freely from the outside world. Among the prominent applications of theintranet and the WWW are databases to support electronic storefronts, parts and productcatalogs, directories and schedules, newsstands, and bookstores. Electronic commercethepurchasing of products and services electronically on the Internetis likely to become a majorapplication supported by such databases.
The future challenges of managing databases on the Web will be many, among them thefollowing:
Web technology needs to be integrated with the object technology. Currently, the web canbe viewed as a distributed object system, with HTML pages functioning as objectsidentified by the URL.
HTML functionality is too simple to support complex application requirements. As we saw,
the Web Integration Option of Informix adds further tags to HTML. In general, additionalfacilities will be needed to (1) make Web clients function as application front ends,integrating data from multiple heterogeneous databases; (2) make Web clients presentdifferent views of the same data to different users; and (3) make Web clients "intelligent"by providing additional data mining functionality (see Section 26.2).
Web page content can be made more dynamic by adding more "behavior" to it as an object(see Chapter 11 for a discussion of object modeling). In this respect (1) client and serverobjects (HTML pages) can be made to interact; (2) Web pages can be treated ascollections of programmable objects; and (3) client-side code can access these objectsand manipulate them dynamically.
The support for a large number of clients coupled with reasonable response times for queriesagainst very large (several tens of gigabytes in size) databases will be major challengesfor Web databases. They will have to be addressed both by Web servers and by theunderlying DBMSs.
Efforts are underway to address the limitations of the current data structuring technology,particularly by the World Wide Web Consortium (W3C). The W3C is designing a Web ObjectModel. W3C is also proposing an Extensible Markup Language (XML) for structured documentinterchange on the Web. XML defines a subset of SGML (the Standard Generalized MarkupLanguage), allowing customization of markup languages with application-specific tags. XML israpidly gaining ground due to its extensibility in defining new tags. W3Cs Document ObjectModel (DOM) defines an object-oriented API for HTML or XML documents presented by a Webclient. W3C is also defining metadata modeling standards for describing Internet resources.
MULTIMEDIA DATABASES :
A multimedia system is a computer controlled integration of medial information objectsof different types (text, images, audio, video,). The integration refers to: Data modeling Storage
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
43/81
PREPARED BY ARUN PRATAP SINGH 42
42
Presentation Time synchronization
A promise is that the media must be digitally represented, or at least digitally controllable.
In the years ahead multimedia information systems are expected to dominate our daily lives. Our
houses will be wired for bandwidth to handle interactive multimedia applications. Our high-definition TV/computer workstations will have access to a large number of databases, including
digital libraries that will distribute vast amounts of multisource multimedia content.
The Nature of Multimedia Data and Applications
Nature of Multimedia Applications
In Section 23.3 we discussed the advanced modeling issues related to multimedia data. We also
examined the processing of multiple types of data in Chapter 13 in the context of object relational
DBMSs (ORDBMSs). DBMSs have been constantly adding to the types of data they support.
Today the following types of multimedia data are available in current systems:
Text: May be formatted or unformatted. For ease of parsing structured documents,standards like SGML and variations such as HTML are being used.
Graphics: Examples include drawings and illustrations that are encoded using somedescriptive standards (e.g., CGM, PICT, postscript).
Images: Includes drawings, photographs, and so forth, encoded in standard formats suchas bitmap, JPEG, and MPEG. Compression is built into JPEG and MPEG. Theseimages are not subdivided into components. Hence querying them by content (e.g., findall images containing circles) is nontrivial.
Animations: Temporal sequences of image or graphic data.
Video:A set of temporally sequenced photographic data for presentation at specifiedratesfor example, 30 frames per second.
Structured audio:A sequence of audio components comprising note, tone, duration, andso forth.
Audio: Sample data generated from aural recordings in a string of bits in digitized form.Analog recordings are typically converted into digital form before storage.
Composite or mixed multimedia data:A combination of multimedia data types such asaudio and video which may be physically mixed to yield a new storage format or logicallymixed while retaining original types and formats. Composite data also containsadditional control information describing how the information should be rendered.
Nature of Multimedia Applications
Multimedia data may be stored, delivered, and utilized in many different ways. Applications maybe categorized based on their data management characteristics as follows:
Repository applications:A large amount of multimedia data as well as metadata is storedfor retrieval purposes. A central repository containing multimedia data may bemaintained by a DBMS and may be organized into a hierarchy of storage levelslocaldisks, tertiary disks and tapes, optical disks, and so on. Examples include repositories ofsatellite images, engineering drawings and designs, space photographs, and radiologyscanned pictures.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
44/81
PREPARED BY ARUN PRATAP SINGH 43
43
Presentation applications:A large number of applications involve delivery of multimediadata subject to temporal constraints. Audio and video data are delivered this way; inthese applications optimal viewing or listening conditions require the DBMS to deliverdata at certain rates offering "quality of service" above a certain threshold. Data isconsumed as it is delivered, unlike in repository applications, where it may be processedlater (e.g., multimedia electronic mail). Simple multimedia viewing of video data, for
example, requires a system to simulate VCR-like functionality. Complex and interactivemultimedia presentations involve orchestration directions to control the retrieval order ofcomponents in a series or in parallel. Interactive environments must support capabilitiessuch as real-time editing analysis or annotating of video and audio data.
Collaborative work using multimedia information: This is a new category of applications inwhich engineers may execute a complex design task by merging drawings, fittingsubjects to design constraints, and generating new documentation, change notifications,and so forth. Intelligent healthcare networks as well as telemedicine will involve doctorscollaborating among themselves, analyzing multimedia patient data and information inreal time as it is generated.
All of these application areas present major challenges for the design of multimedia databasesystems.
DATA MANAGEMENT ISSUES :
Multimedia applications dealing with thousands of images, documents, audio and videosegments, and free text data depend critically on appropriate modeling of the structure andcontent of data and then designing appropriate database schemas for storing and retrievingmultimedia information. Multimedia information systems are very complex and embrace a largeset of issues, including the following:
Modeling: This area has the potential for applying database versus information retrievaltechniques to the problem. There are problems of dealing with complex objects (see
Chapter 11) made up of a wide range of types of data: numeric, text, graphic (computer-generated image), animated graphic image, audio stream, and video sequence.Documents constitute a specialized area and deserve special consideration.
Design: The conceptual, logical, and physical design of multimedia databases has not beenaddressed fully, and it remains an area of active research. The design process can bebased on the general methodology described in Chapter 16, but the performance andtuning issues at each level are far more complex.
Storage: Storage of multimedia data on standard disklike devices presents problems ofrepresentation, compression, mapping to device hierarchies, archiving, and bufferingduring the input/output operation. Adhering to standards such as JPEG or MPEG is oneway most vendors of multimedia products are likely to deal with this issue. In DBMSs, a"BLOB" (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved.Standardized software will be required to deal with synchronization andcompression/decompression, and will be coupled with indexing problems, which are stillin the research domain.
Queries and retrieval: The "database" way of retrieving information is based on querylanguages and internal index structures. The "information retrieval" way relies strictly onkeywords or predefined index terms. For images, video data, and audio data, this opensup many issues, among them efficient query formulation, query execution, and
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
45/81
PREPARED BY ARUN PRATAP SINGH 44
44
optimization. The standard optimization techniques we discussed in Chapter 18 need tobe modified to work with multimedia data types.
Performance: For multimedia applications involving only documents and text, performanceconstraints are subjectively determined by the user. For applications involving videoplayback or audio-video synchronization, physical limitations dominate. For instance,video must be delivered at a steady rate of 60 frames per second. Techniques for queryoptimization may compute expected response time before evaluating the query. The useof parallel processing of data may alleviate some problems, but such efforts are currentlysubject to further experimentation.
Such issues have given rise to a variety of open research problems. We look at a few
representative problems now.
MULTIMEDIA DATABASE APPLICATIONS :
Large-scale applications of multimedia databases can be expected to encompass a large numberof disciplines and enhance existing capabilities. Some important applications will be involved:
Documents and records management:A large number of industries and businesses keepvery detailed records and a variety of documents. The data may include engineeringdesign and manufacturing data, medical records of patients, publishing material, andinsurance claim records.
Knowledge dissemination: The multimedia mode, a very effective means of knowledgedissemination, will encompass a phenomenal growth in electronic books, catalogs,manuals, encyclopedias and repositories of information on many topics.
Education and training: Teaching materials for different audiencesfrom kindergartenstudents to equipment operators to professionalscan be designed from multimediasources. Digital libraries are expected to have a major influence on the way future studentsand researchers as well as other users will access vast repositories of educationalmaterial. (See Section 27.6 on digital libraries.)
Marketing, advertising, retailing, entertainment, and travel: There are virtually no limits tousing multimedia information in these applicationsfrom effective sales presentations tovirtual tours of cities and art galleries. The film industry has already shown the power ofspecial effects in creating animations and synthetically designed animals, aliens, andspecial effects. The use of predesigned stored objects in multimedia databases willexpand the range of these applications.
Real-time control and monitoring: Coupled with active database technology, multimediapresentation of information can be a very effective means for monitoring and controllingcomplex tasks such as manufacturing operations, nuclear power plants, patients inintensive care units, and transportation systems.
MOBILE DATABASES :
Recent advances in wireless technology have led to mobile computing, a new dimension in data
communication and processing. The mobile computing environment will provide database
applications with useful aspects of wireless technology. The mobile computing platform allows
users to establish communication with other users and to manage their work while they are
mobile. This feature is especially useful to geographically dispersed organizations. Typical
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
46/81
PREPARED BY ARUN PRATAP SINGH 45
45
examples might include traffic police, taxi dispatchers, and weather reporting services, as well as
financial market reporting and information brokering applications. However, there are a number
of hardware as well as software problems that must be resolved before the capabilities of mobile
computing can be fully utilized. Some of the software problemswhich may involve data
management, transaction management, and database recoveryhave their origin in distributed
database systems. In mobile computing, however, these problems become more difficult to solve,
mainly because of the narrow bandwidth of the wireless communication channels, the relatively
short active life of the power supply (battery) of mobile units, and the changing locations of
required information (sometimes in cache, sometimes in the air, sometimes at the server). In
addition, mobile computing has its own unique architectural challenges.
The general architecture of a mobile platform is illustrated in Figure 27.04. It is a distributed
architecture where a number of computers, generally referred to as Fixed Hosts (FS) and Base
Stations (BS), are interconnected through a high-speed wired network. Fixed hosts are general
purpose computers that are not equipped to manage mobile units but can be configured to do so.
Base stations are equipped with wireless interfaces and can communicate with mobile units to
support data access.
Mobile Units (MU) (or hosts) and base stations communicate through wireless channels havingbandwidths significantly lower than those of a wired network. A downlink channel is used forsending data from a BS to an MU and an uplink channel is used for sending data from an MU to
its BS. Recent products for portable wireless have an upper limit of 1 Mbps (megabits per second)for infrared communication, 2 Mbps for radio communication, and 9.14 Kbps (kilobits per second)for cellular telephony. Ethernet, by comparison, provides 10 Mbps fast Ethernet and FDDI provide100 Mbps and ATM (asynchronous transfer mode) provides 155 Mbps.
Mobile units are battery-powered portable computers that move freely in a geographic mobility
domain, an area that is restricted by the limited bandwidth of wireless communication channels.
To manage the mobility of units, the entire geographic mobility domain is divided into smaller
domains called cells. The mobile discipline requires that the movement of mobile units be
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
47/81
PREPARED BY ARUN PRATAP SINGH 46
46
unrestricted within the geographic mobility domain (intercell movement), while having information
access contiguity during movement guarantees that the movement of a mobile unit across cell
boundaries will have no effect on the data retrieval process.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
48/81
PREPARED BY ARUN PRATAP SINGH 47
47
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
49/81
PREPARED BY ARUN PRATAP SINGH 48
48
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
50/81
PREPARED BY ARUN PRATAP SINGH 49
49
Types of Data in Mobile Applications
Applications that run on mobile hosts have different data requirements. Users either engage inpersonal communications or office activities, or they simply receive updates on frequentlychanging information. Mobile applications can be categorized in two ways: (1) vertical applicationsand (2) horizontal applications (Note 3). In vertical applications users access data within aspecific cell, and access is denied to users outside of that cell. For example, users can obtaininformation on the location of doctors or emergency centers within a cell or parking availabilitydata at an airport cell. In horizontal applications, users cooperate on accomplishing a task, andthey can handle data distributed throughout the system. The horizontal application market ismassive; two types of applications most mentioned are mail-enabled applications and informationservices to mobile users.
Data may be classified into three categories:
1. Private data:A single user owns this data and manages it. No other user may access it.
2. Public data: This data can be used by anyone who can read it. Only one source updates it.Examples include weather bulletins or stock prices.
3. Shared data: This data is accessed both in read and write modes by groups of users.Examples include inventory data for products in a company.
Public data is primarily managed by vertical applications, while shared data is used by horizontal
applications, possibly with some replication. Copies of shared data may be stored both in base
and mobile stations. This presents a variety of difficult problems in transaction management
consistency as well as integrity and scalability of the architecture.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
51/81
PREPARED BY ARUN PRATAP SINGH 50
50
SPATIAL DATABASE :
Spatial databases provide concepts for databases that keep track of objects in a multi-
dimensional space. For example, cartographic databases that store maps include two-
dimensional spatial descriptions of their objectsfrom countries and states to rivers, cities, roads,seas, and so on. These databases are used in many applications, such as environmental,
emergency, and battle management. Other databases, such as meteorological databases for
weather information, are three-dimensional, since temperatures and other meteorological
information are related to three-dimensional spatial points. In general, a spatial database stores
objects that have spatial characteristics that describe them. The spatial relationships among the
objects are important, and they are often needed when querying the database. Although a spatial
database can in general refer to an n-dimensional space for any n, we will limit our discussion to
two dimensions as an illustration.
The main extensions that are needed for spatial databases are models that can interpret spatial
characteristics. In addition, special indexing and storage structures are often needed to improveperformance. Let us first discuss some of the model extensions for two-dimensional spatialdatabases. The basic extensions needed are to include two-dimensional geometric concepts,such as points, lines and line segments, circles, polygons, and arcs, in order to specify the spatialcharacteristics of objects. In addition, spatial operations are needed to operate on the objectsspatial characteristicsfor example, to compute the distance between two objectsas well asspatial Boolean conditionsfor example, to check whether two objects spatially overlap. Toillustrate, consider a database that is used for emergency management applications. A descriptionof the spatial positions of many types of objects would be needed. Some of these objects generally
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
52/81
PREPARED BY ARUN PRATAP SINGH 51
51
have static spatial characteristics, such as streets and highways, water pumps (for fire control),police stations, fire stations, and hospitals. Other objects have dynamic spatial characteristics thatchange over time, such as police vehicles, ambulances, or fire trucks.
The following categories illustrate three typical types of spatial queries:
Range query: Finds the objects of a particular type that are within a given spatial area or
within a particular distance from a given location. (For example, finds all hospitals withinthe Dallas city area, or finds all ambulances within five miles of an accident location.)
Nearest neighbor query: Finds an object of a particular type that is closest to a givenlocation. (For example, finds the police car that is closest to a particular location.)
Spatial joins or overlays: Typically joins the objects of two types based on some spatialcondition, such as the objects intersecting or overlapping spatially or being within acertain distance of one another. (For example, finds all cities that fall on a major highwayor finds all homes that are within two miles of a lake.)
For these and other types of spatial queries to be answered efficiently, special techniques for
spatial indexing are needed. One of the best known techniques is the use of R-trees and their
variations. R-trees group together objects that are in close spatial physical proximity on the same
leaf nodes of a tree-structured index. Since a leaf node can point to only a certain number of
objects, algorithms for dividing the space into rectangular subspaces that include the objects are
needed. Typical criteria for dividing the space include minimizing the rectangle areas, since this
would lead to a quicker narrowing of the search space. Problems such as having objects with
overlapping spatial areas are handled in different ways by the many different variations of R-trees.
The internal nodes of R-trees are associated with rectangles whose area covers all the rectangles
in its subtree. Hence, R-trees can easily answer queries, such as find all objects in a given area
by limiting the tree search to those subtrees whose rectangles intersect with the area given in the
query.
Other spatial storage structures include quadtrees and their variations. Quadtrees generally
divide each space or subspace into equally sized areas, and proceed with the sub-divisions ofeach subspace to identify the positions of various objects. Recently, many newer spatial access
structures have been proposed, and this area is still an active research area.
CLUSTERING BASED DISASTER PROOF DATABASES :
If downtime is not an option, and the Web never closes for business, how do you keep
your company's doors open 24/7? The answer lies in high-availability (HA) systems that
approach 100 percent uptime.
The principles of high availability define a level of backup and recovery. Until recently, highavailability simply meant hardware or software recovery via RAID (Redundant Array of
Independent Disks). RAID addressed the need for fault tolerance in data but didn't solve the
problem of a complete DBMS failure.
http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259')http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259')http://openimagewindow%28%27http//www.pcmag.com/image_popup/0,1740,iid=9768,00.asp',%20'398',%20'259') -
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
53/81
PREPARED BY ARUN PRATAP SINGH 52
52
For even more uptime, database administrators are turning to clustering as the best way to
achieve high availability. Recent moves by Oracle, with its Real Application Cluster, and Microsoft,
with MCS (Microsoft Cluster Service) have made multinode clusters for HA in production
environments mainstream.
In a high-availability setup, a cluster functions by associating servers that have the ability to share
a disk group. As illustrated here, each node has fail-over node within its cluster. If a failure occurs
in Node 1, Node 2 picks up the slack by assuming the resources and the unique logic and
transaction functions of the failed DBMS.
Clustering can have the added benefit of not being bound by node colocation. Fiber-optic
connections, which can be cabled for miles between the nodes in a cluster, ensure continued
operation even in the face of a complete meltdown of your primary system.
When a hot-standby model is in place, downtimes may be less than a minute. This is especially
important if your service-level agreement requires higher than 99.9 percent uptime, which
translates to only 8.7 hours of downtime per year.
Clustering technologies are pricey, however. The enterprise software and hardware must be
uniform and compatible with the clustering technology to work properly. There's also the
associated overhead in the design and maintenance of redundant systems.
One cost-effective solution is log shipping, in which a database can synchronize physically distinctdatabases by sending transactions logs from one server to another. In the event of a failure, the
logs can be used to reinstate the settings up to the point of the failure. Other methods include
snapshot databases and replication technologies such as Sybase's Replication Server, which has
been around for decades.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
54/81
PREPARED BY ARUN PRATAP SINGH 53
53
High-availability add-ons to databases are useful but should be understood in the context of a
complete HA methodology. This requires a concerted effort toward standardization on each of
your mission-critical infrastructures. Fault-tolerant application design with hands-off exception
handling, self-healing and redundant networks, and a stable operating system are all prerequisites
for high availability.
When you adhere to these standards, enforceable, database-specific HA technologies are sure
to lead your enterprise on the path to minimum downtime.
SOME QUESTIONS
Q .1 How a distributed database can be recovered in case of failure ?
Ans :
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
55/81
PREPARED BY ARUN PRATAP SINGH 54
54
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
56/81
PREPARED BY ARUN PRATAP SINGH 55
55
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
57/81
PREPARED BY ARUN PRATAP SINGH 56
56
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
58/81
PREPARED BY ARUN PRATAP SINGH 57
57
In a distributed setting, the server must log a write operation not only to the local log file, butalso to 1, 2 or more remote logs. The issue is close to replication methods, the main choice beingto adopt either a synchronousor asynchronous protocol.
Synchronous protocol.The server acknowledges the Client only when all the remote nodes have sent aconfirmation of the successful completion of their write() operation. In practice, the Clientwaits until the slower of all the writers sends its acknowledgment. This may severely hinderthe efficiency of updates, but the obvious advantage is that all the replicas are consistent.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
59/81
PREPARED BY ARUN PRATAP SINGH 58
58
Asynchronous protocol.The Client application waits only until one of the copies (the fastest) has been effectivelywritten. Clearly, this puts a risk on data consistency, as a subsequent read operation mayaccess an older version that does not yet reflect the update.
Q. 2 What is a multimedia database? explain the methods of mining multimediadatabase.
Ans : Multimedia database : Explained above.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
60/81
PREPARED BY ARUN PRATAP SINGH 59
59
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
61/81
PREPARED BY ARUN PRATAP SINGH 60
60
The methods of mining multimedia database :
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
62/81
PREPARED BY ARUN PRATAP SINGH 61
61
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
63/81
PREPARED BY ARUN PRATAP SINGH 62
62
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
64/81
PREPARED BY ARUN PRATAP SINGH 63
63
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
65/81
PREPARED BY ARUN PRATAP SINGH 64
64
Q. 3 Write short notes on any four of the following :(1) Web database(2) Mobile databases
Ans : Explained above.
Q. 4 Design issues of distributed databases.Ans :
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
66/81
PREPARED BY ARUN PRATAP SINGH 65
65
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
67/81
PREPARED BY ARUN PRATAP SINGH 66
66
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
68/81
PREPARED BY ARUN PRATAP SINGH 67
67
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
69/81
PREPARED BY ARUN PRATAP SINGH 68
68
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
70/81
PREPARED BY ARUN PRATAP SINGH 69
69
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
71/81
PREPARED BY ARUN PRATAP SINGH 70
70
Q. 5 What is commit protocol and why is it required in a distributed database?Describes and compare. Two phase and three phase commit. What is blocking andhow does the three phase protocol prevent it? Explain Distributed transaction.
Ans :
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
72/81
PREPARED BY ARUN PRATAP SINGH 71
71
Distributed transaction :
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
73/81
PREPARED BY ARUN PRATAP SINGH 72
72
Commit Protocol :
why commit protocol is required in a distributed database : because of system failure
and to provide atomicity across sites.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
74/81
PREPARED BY ARUN PRATAP SINGH 73
73
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
75/81
PREPARED BY ARUN PRATAP SINGH 74
74
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
76/81
PREPARED BY ARUN PRATAP SINGH 75
75
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
77/81
PREPARED BY ARUN PRATAP SINGH 76
76
BLOCKING PROBLEM :
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
78/81
PREPARED BY ARUN PRATAP SINGH 77
77
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
79/81
PREPARED BY ARUN PRATAP SINGH 78
78
Q. 6 What are web databases ? How databases are accessed through web ?
Ans : Web databases : Explained above.
Providing Access to Databases on the World Wide Web
Todays technology has been moving rapidly from static to dynamic Web pages, where contentmay be in a constant state of flux. The Web server uses a standard interface called the CommonGateway Interface (CGI) to act as the middlewarethe additional software layer between theuser interface front-end and the DBMS back-end that facilitates access to heterogeneousdatabases. The CGI middleware executes external programs or scripts to obtain the dynamicinformation, and it returns the information to the server in HTML, which is given back to thebrowser.
As the Web undergoes its latest transformations, it has become necessary to allow users accessnot only to file systems but to databases and DBMSs to support query processing, report
generation, and so forth. The existing approaches may be divided into two categories:1.Access using CGI scripts: The database server can be made to interact with the Web server
via CGI. Figure 27.01 shows a schematic for the database access architecture on the Webusing CGI scripts, which are written in languages like PERL, Tcl, or C. The maindisadvantage of this approach is that for each user request, the Web server must start anew CGI process: each process makes a new connection with the DBMS and the Webserver must wait until the results are delivered to it. No efficiency is achieved by anygrouping of multiple users requests; moreover, the developer must keep the scripts in theCGI-bin subdirectories only, which opens it to a possible breach of security. The fact that
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
80/81
PREPARED BY ARUN PRATAP SINGH 79
79
CGI has no language associated with it but requires database developers to learn PERLor Tcl is also a drawback. Manageability of scripts is another problem if the scripts arescattered everywhere.
2. Access using JDBC: JDBC is a set of Java classes developed by Sun Microsystems toallow access to relational databases through the execution of SQL statements. It is a wayof connecting with databases, without any additional processes for each client request.Note that JDBC is a name trademarked by Sun; it does not stand for Java Data Baseconnectivity as many believe. JDBC has the capabilities to connect to a database, sendSQL statements to a database and to retrieve the results of a query using the Java classesConnection, Statement, and Result Set respectively. With Javas claimed platformindependence, an application may run on any Java-capable browser, which loads the Javacode from the server and runs it on the clients browser. The Java code is DBMS
transparent; the JDBC drivers for individual DBMSs on the server end carry the task ofinteracting with that DBMS. If the JDBC driver is on the client, the application runs on theclient and its requests are communicated to the DBMS directly by the driver. For standardSQL requests, many RDBMSs can be accessed this way. The drawback of using JDBCis the prospect of executing Java through virtual machines with inherent efficiency. TheJDBC bridge to Object Database Connectivity (ODBC) remains another way of getting tothe RDBMSs.
-
7/22/2019 Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
81/81
80
Besides CGI, other Web server vendors are launching their own middleware products for
providing multiple database connectivity. These include Internet Server API (ISAPI) from
Microsoft and Netscape API (NSAPI) from Netscape. In the next section we describe the Web
access option provided by Informix. Other DBMS vendors already have, or will have similar
provisions to support database access on the Web.
Q. 7 Compare the relative merits of centralized and hierarchical deadlockdetection in a distributed DBMS.
Ans :
A centralized deadlock detection scheme is a reasonable choice if the concurrent controlalgorithm is also centralized.It is better for distributed access patterns across sites since deadlocks occurring between any canbe immediately identified. However, this benefit comes at the expense of communicationsbetween the central location and every other site.
A hierarchical deadlock detectionscheme releases the burden of one single site for deadlockdetection, and let more sites get involved.When access patterns are more localized, perhaps by geographic area, they may likely occuramong certain sites with frequent communications. The hierarchical approach is more efficient inthat it checks for deadlocks where they most likely happen, the hierarchical scheme splitsdeadlock detection efforts, thus resulting in greater efficiency.
top related