my slide distributed database management systems

Post on 19-Jan-2015

2.635 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Rushdi Shams, Dept of CSE, KUET 1

Database Database SystemsSystems

Distributed Database Distributed Database SystemsSystems

Version 1.0Version 1.0

2Rushdi Shams, Dept of CSE, KUET

Introduction Introduction A distributed database systems is a A distributed database systems is a

database systems which is database systems which is fragmented fragmented or replicatedor replicated on machines on machines

These machines are usually located on These machines are usually located on different geographical location of an different geographical location of an organizationorganization

FragmentationFragmentation is made of some subsets is made of some subsets of the original databaseof the original database

ReplicationReplication refers to the copy of the refers to the copy of the whole database or part of the original whole database or part of the original databasedatabase

3Rushdi Shams, Dept of CSE, KUET

Idea of Distributed Idea of Distributed Database SystemsDatabase Systems

4 sites connected by a communication network4 sites connected by a communication network Sites 1, 2 and 4 run a single databaseSites 1, 2 and 4 run a single database Site 3 has no database. It accesses the other 3 Site 3 has no database. It accesses the other 3

sites for data manipulationsites for data manipulation

4Rushdi Shams, Dept of CSE, KUET

FragmentationFragmentation

There are 2 basic types of There are 2 basic types of fragmentationsfragmentations

1.1. Horizontal fragmentationHorizontal fragmentation

2.2. Vertical fragmentationVertical fragmentation

5Rushdi Shams, Dept of CSE, KUET

Horizontal Horizontal FragmentationFragmentation

Horizontal fragmentation is the subset of rows of a Horizontal fragmentation is the subset of rows of a single tablesingle table

Say, we need to manipulate a table that contains Say, we need to manipulate a table that contains information about British Peopleinformation about British People

We have 3 sitesWe have 3 sites Edinburgh site will have those rows of the table that Edinburgh site will have those rows of the table that

have information about Scottish peoplehave information about Scottish people Cardiff site will have those rows of the table that have Cardiff site will have those rows of the table that have

information about Welsh peopleinformation about Welsh people London site will have those rows of the table that have London site will have those rows of the table that have

information about English peopleinformation about English people The 3 sites are working as distributed processors. So, The 3 sites are working as distributed processors. So,

together they represent information about all the together they represent information about all the British peopleBritish people

6Rushdi Shams, Dept of CSE, KUET

Horizontal Fragmentation Horizontal Fragmentation (continued)(continued)

7Rushdi Shams, Dept of CSE, KUET

Horizontal Fragmentation Horizontal Fragmentation (continued)(continued)

Horizontal fragmentation is done by Horizontal fragmentation is done by restricting the table with WHERE restricting the table with WHERE condition in query languages!!condition in query languages!!

In the previous example, you can In the previous example, you can fragment the table like fragment the table like

1.1. WHERE LOCATION=EDINBURGHWHERE LOCATION=EDINBURGH2.2. WHERE LOCATION=CARDIFFWHERE LOCATION=CARDIFF3.3. WHERE LOCATION=LONDONWHERE LOCATION=LONDON To find the original table, you just union To find the original table, you just union

all the fragmented tables!all the fragmented tables! Easy, huh?Easy, huh?

8Rushdi Shams, Dept of CSE, KUET

Horizontal Fragmentation Horizontal Fragmentation (continued)(continued)

Consider the horizontal fragmentation of relation Consider the horizontal fragmentation of relation Proj according to its BUDGET value.Proj according to its BUDGET value.

Relations with BUDGET > 200000 go into Proj1 Relations with BUDGET > 200000 go into Proj1 and the rest goes into Proj2.and the rest goes into Proj2.

Proj1= Proj1= (budget>200000) (budget>200000) ProjProjProj2= Proj2= (budget (budget ≤≤ 200000) 200000) ProjProj

9Rushdi Shams, Dept of CSE, KUET

Vertical FragmentationVertical Fragmentation

Vertical fragmentation is a method Vertical fragmentation is a method of fragmenting a table byof fragmenting a table by projectingprojecting columns of a table columns of a table with with primary keyprimary key

To find out the original table, you To find out the original table, you just need to join the newly created just need to join the newly created tables according to the primary key!tables according to the primary key!

Again, it’s easy, huh?Again, it’s easy, huh?

10Rushdi Shams, Dept of CSE, KUET

Vertical Fragmentation Vertical Fragmentation (continued)(continued)

The table proj is fragmented into 2 tables proj 1 and The table proj is fragmented into 2 tables proj 1 and proj 2proj 2

Both tables have the primary key- PNO. Keep an eye on Both tables have the primary key- PNO. Keep an eye on it, fellows!it, fellows!

If you join them according to the PNO of both table, If you join them according to the PNO of both table, what do you get? Answer- Proj table again!! what do you get? Answer- Proj table again!!

11Rushdi Shams, Dept of CSE, KUET

Both Fragmentation at A Both Fragmentation at A GlanceGlance

12Rushdi Shams, Dept of CSE, KUET

Why FragmentationWhy Fragmentation

Usage:Usage:

Applications work with views rather Applications work with views rather than entire relationsthan entire relations

Efficiency:Efficiency:

Data is stored close to where it is Data is stored close to where it is most frequently usedmost frequently used

Data that is not needed by local Data that is not needed by local applications are not storedapplications are not stored

13Rushdi Shams, Dept of CSE, KUET

Why Fragmentation Why Fragmentation (continued)(continued)

Parallelism:Parallelism:

Transaction can be divided into Transaction can be divided into several subqueries that operate on several subqueries that operate on fragmentsfragments

Security:Security:

Data that is not needed by local Data that is not needed by local applications are not stored and so is applications are not stored and so is not vulnerable to unauthorized usersnot vulnerable to unauthorized users

14Rushdi Shams, Dept of CSE, KUET

Disadvantage of Disadvantage of FragmentationFragmentation

Performance:Performance:

If queries involve to fetch data from If queries involve to fetch data from tables that are on different sites, it tables that are on different sites, it requires processing timerequires processing time

15Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of FragmentationFragmentation

Well, when I first hear correctness- I Well, when I first hear correctness- I was boomed! Actually it means was boomed! Actually it means nothing rather than some properties nothing rather than some properties of fragmentationof fragmentation

So, don’t worry about that. It is So, don’t worry about that. It is called CORRECTNESS in database called CORRECTNESS in database jargon, so, don’t call it property, jargon, so, don’t call it property, a’right?a’right?

16Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of Fragmentation (continued)Fragmentation (continued)

There are 3 correctness rulesThere are 3 correctness rules

1.1. CompletenessCompleteness

2.2. ReconstructionReconstruction

3.3. DisjointnessDisjointness

17Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of Fragmentation (continued)Fragmentation (continued)

1.1. Completeness:Completeness:

If relation R is fragmented into If relation R is fragmented into fragments R1, R2, R3… Rn, each data fragments R1, R2, R3… Rn, each data item that can be found in R must appear item that can be found in R must appear in at least one fragmentin at least one fragment

So, why don’t you say this way- no data So, why don’t you say this way- no data item of original relation R gets missing!!item of original relation R gets missing!!

Man, I hate theoretical definitions!Man, I hate theoretical definitions!

18Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of Fragmentation (continued)Fragmentation (continued)

2.2. Reconstruction:Reconstruction:

There must be a relational There must be a relational operation by which we can operation by which we can reconstruct R from the fragmentsreconstruct R from the fragments

We already saw that by Unioning We already saw that by Unioning (() horizontal fragments we can ) horizontal fragments we can have original R and by joining have original R and by joining vertical fragments, we can achieve vertical fragments, we can achieve R!R!

19Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of Fragmentation (continued)Fragmentation (continued)

3.3. Disjointness:Disjointness:

If data item Di appears in fragment If data item Di appears in fragment Ri, then it should not appear in any Ri, then it should not appear in any other fragmentother fragment

Exception of this is in vertical Exception of this is in vertical fragmentation, where primary key fragmentation, where primary key attributes must be repeated to allow attributes must be repeated to allow reconstructionreconstruction

20Rushdi Shams, Dept of CSE, KUET

TransparencyTransparency

You have distributed one table to 3 You have distributed one table to 3 sites just now. The user, when he sites just now. The user, when he requires data, should not know this!requires data, should not know this!

This process of hiding the This process of hiding the fragmentation and distribution the fragmentation and distribution the fragments to different sites is called fragments to different sites is called transparencytransparency

21Rushdi Shams, Dept of CSE, KUET

Types of TransparencyTypes of Transparency

1.1. Location transparencyLocation transparencyUser should not be aware of the location of the User should not be aware of the location of the data. This simplifies the user interface and data. This simplifies the user interface and user programs that are used to query the tableuser programs that are used to query the table

2.2. Fragmentation transparencyFragmentation transparencyUser must not know that the data have been User must not know that the data have been fragmented and how the data have been fragmented and how the data have been fragmentedfragmented

3.3. Replication transparencyReplication transparencyReplication is necessary sometimes as this Replication is necessary sometimes as this makes the processing faster. But user should makes the processing faster. But user should not be aware of it.not be aware of it.

22Rushdi Shams, Dept of CSE, KUET

Need of TransparencyNeed of Transparency A manager wishing to find the total

number of employees at the Scottish subsidiary need not be aware that he is querying a remote database

A manager running a query in London should not need to be aware that to produce the aggregate salary bill for the company all three sites – London, Cardiff and Edinburgh – need to be interrogated

When periodically data need to be updated, the user need not directly know that three sites are effectively updated

23Rushdi Shams, Dept of CSE, KUET

Foundation RuleFoundation Rule

The foundation rule of distributed The foundation rule of distributed database systems states-database systems states-

““Although the database systems are Although the database systems are distributed to several sites, it must look distributed to several sites, it must look like a centralised database systems to the like a centralised database systems to the user”user”

Then how do you make this foundation Then how do you make this foundation rule true?rule true?

Answer- by applying 3 types of Answer- by applying 3 types of transparencies transparencies

24Rushdi Shams, Dept of CSE, KUET

Advantages of Distributed Advantages of Distributed Database SystemsDatabase Systems

Reflects organizational structureReflects organizational structure — database fragments are located in — database fragments are located in the departments they relate to. the departments they relate to.

Local autonomyLocal autonomy — a department — a department can control the data about them (as can control the data about them (as they are the ones familiar with it.) they are the ones familiar with it.)

Improved availabilityImproved availability — a fault in — a fault in one database system will only affect one database system will only affect one fragment, instead of the entire one fragment, instead of the entire database database

25Rushdi Shams, Dept of CSE, KUET

Advantages of Distributed Advantages of Distributed Database Systems Database Systems

(continued)(continued) Improved performanceImproved performance — data is located — data is located

near the site of greatest demand, and the near the site of greatest demand, and the database systems themselves are parallelized, database systems themselves are parallelized, allowing load on the databases to be balanced allowing load on the databases to be balanced among servers. (A high load on one module of among servers. (A high load on one module of the database won't affect other modules of the database won't affect other modules of the database in a distributed database.) the database in a distributed database.)

EconomicsEconomics — it costs less to create a — it costs less to create a network of smaller computers with the power network of smaller computers with the power of a single large computer. of a single large computer.

Modularity Modularity — systems can be modified, — systems can be modified, added and removed from the distributed added and removed from the distributed database without affecting other modules database without affecting other modules (systems). (systems).

26Rushdi Shams, Dept of CSE, KUET

Disadvantages of Disadvantages of Distributed Database Distributed Database

SystemsSystems ComplexityComplexity — extra work must be done by the — extra work must be done by the

DBAs to ensure that the distributed nature of the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be system is transparent. Extra work must also be done to maintain multiple disparate systems, done to maintain multiple disparate systems, instead of one big one. Extra database design instead of one big one. Extra database design work must also be done to account for the work must also be done to account for the disconnected nature of the database — for disconnected nature of the database — for example, joins become prohibitively expensive example, joins become prohibitively expensive when performed across multiple systems. when performed across multiple systems.

Economics Economics — increased complexity and a more — increased complexity and a more extensive infrastructure means extra labour extensive infrastructure means extra labour costs. costs.

27Rushdi Shams, Dept of CSE, KUET

Disadvantages of Disadvantages of Distributed Database Distributed Database Systems (continued)Systems (continued)

SecuritySecurity — remote database fragments must be — remote database fragments must be secured, and they are not centralized so the remote secured, and they are not centralized so the remote sites must be secured as well. The infrastructure sites must be secured as well. The infrastructure must also be secured (eg: by encrypting the must also be secured (eg: by encrypting the network links between remote sites). network links between remote sites).

Difficult to maintain integrityDifficult to maintain integrity — in a distributed — in a distributed database enforcing integrity over a network may database enforcing integrity over a network may require too much networking resources to be require too much networking resources to be feasible. feasible.

InexperienceInexperience — distributed databases are difficult — distributed databases are difficult to work with, and as a young field there is not much to work with, and as a young field there is not much readily available experience on proper practice. readily available experience on proper practice.

28Rushdi Shams, Dept of CSE, KUET

Types of Distributed Types of Distributed Database SystemsDatabase Systems

1.1. Homogeneous Database SystemsHomogeneous Database Systems

2.2. Heterogeneous Database SystemsHeterogeneous Database Systems

3.3. Federated Database SystemsFederated Database Systems

29Rushdi Shams, Dept of CSE, KUET

Homogeneous Distributed Homogeneous Distributed Database SystemsDatabase Systems

Data is distributed across 2 or more Data is distributed across 2 or more systemssystems

All the systems will have to run the All the systems will have to run the same DBMS (eg. Oracle)same DBMS (eg. Oracle)

Moreover, the systems should be run Moreover, the systems should be run on the same hardware platformon the same hardware platform

And the systems should be run on the And the systems should be run on the same Operating Systemssame Operating Systems

Hmm, pretty weird??Hmm, pretty weird??

30Rushdi Shams, Dept of CSE, KUET

Homogeneous Distributed Homogeneous Distributed Database Systems Database Systems

(continued)(continued)

31Rushdi Shams, Dept of CSE, KUET

Heterogeneous Distributed Heterogeneous Distributed Database SystemsDatabase Systems

Data is distributed across 2 or more Data is distributed across 2 or more systemssystems

Those systems’ hardware & software Those systems’ hardware & software configuration is diverseconfiguration is diverse

One site might be running ORACLE under Windows NT, another site Informix under UNIX, and yet another site Ingress under Windows NT

Pretty Cool, huh?

32Rushdi Shams, Dept of CSE, KUET

Heterogeneous Distributed Heterogeneous Distributed Database Systems Database Systems

(continued)(continued)

UNIX

INFORMIX

INGRESS

33Rushdi Shams, Dept of CSE, KUET

Federated Distributed Federated Distributed Database SystemsDatabase Systems

Switzerland is a country that is Switzerland is a country that is comprised with several political comprised with several political federationsfederations

These federations are autonomous and These federations are autonomous and political unitspolitical units

The national level decisions are made The national level decisions are made by combining their own decisionsby combining their own decisions

A federated database system is made up of a number of relatively independent, autonomous databases

34Rushdi Shams, Dept of CSE, KUET

Federated Distributed Federated Distributed Database Systems Database Systems

(continued)(continued)

35Rushdi Shams, Dept of CSE, KUET

Centralized DBMS vs Centralized DBMS vs Distributed DBMSDistributed DBMS

The system catalogue of a distributed database has to be more complex. For instance, it has to store details about the location of fragments and replicates

Concurrency problems are multiplied in distributed systems. The problems of propagating updates to a series of different sites are very involved

36Rushdi Shams, Dept of CSE, KUET

Centralized DBMS vs Centralized DBMS vs Distributed DBMS Distributed DBMS

(continued)(continued) A query optimiser in a true

distributed system should be able to utilise information about the structure of the network in deciding how best to satisfy a given query

To ensure a robust system, the distributed DBMS should not be located solely at one site. Software as well as data need to be distributed

37Rushdi Shams, Dept of CSE, KUET

Implementation Phase of Implementation Phase of Distributed DBMSDistributed DBMS

1. In the first phase we distribute queries between sites but update only to a single site

2. In the second phase we not only distribute queries, we also distribute transactions between sites.

The latter scenario is clearly the more technically challenging of the two

Most existing distributed database systems are in phase 1

Very few organisations seem to have solved all of the problems associated with phase 2 applications

38Rushdi Shams, Dept of CSE, KUET

ReferencesReferences

www.wikipedia.orgwww.wikipedia.org Database Systems by Paul Beynon-Database Systems by Paul Beynon-

Devies, Palgrave Macmillan, 2004Devies, Palgrave Macmillan, 2004 www.cs.uga.edu/~tartir/classes/8370/FDBwww.cs.uga.edu/~tartir/classes/8370/FDB

S.htmlS.html

Distributed Database Design by Fabio Distributed Database Design by Fabio Porto, Database LaboratoryPorto, Database Laboratory

John hall, Senior Lecturer, University of John hall, Senior Lecturer, University of Bolton, United KingdomBolton, United Kingdom

top related