1 distributed database systems. 2 a distributed database on a geographically dispersed network

66
1 Distributed Database Systems

Upload: julia-chandler

Post on 06-Jan-2018

240 views

Category:

Documents


0 download

DESCRIPTION

3 A Distributed Database on a Local Network

TRANSCRIPT

Page 1: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

1

Distributed Database Systems

Page 2: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

2

A Distributed Database on a Geographically Dispersed Network

Page 3: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

3

A Distributed Database on a Local Network

Page 4: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

4

A Multi-Processor System

Page 5: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

5

Types of Accesses to a Distributed Database

Page 6: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

6

Distributed Access Plan

1) At site 1Send sites 2 and 3 the supplier number SN

2) At sites 2 and 3Execute in parallel, upon receipt of the supplier number, the following program:

Find all PARTS records havingSUP # = SN;Send result to site 1

3) At Site 1Merge results from sites 2 and 3;Output the result.

Page 7: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

7

Page 8: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

8

Components of a Commercial DDBMS

Page 9: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

9

Data DistributionProblem:

Choose a unit of the logical database to use for assignment to data modules.

Possibilities:Relations –Distribution issues will influence

logical database design.Columns –Distribution issues will

influence logical database design.Rows –Too many; Directories become too

large.Data Items -Too many; Directories become too

large.

Page 10: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

10

Data Distribution

Fragments – Logically defined rectangular subsets of relationsRelation 1

Relation 2

Fragment 2

Fragment 3

Fragment 1

Fragment 1

Fragment 2

Page 11: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

11

Data DistributionLogical definition of fragments -

Jones

35 32K

Salesman

Black AName Age $ Job-Title Supervis

orDept.

Fragment 1

Fragment 2 Fragment 3

$ > 30K

$ < 30K

Page 12: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

12

Data DistributionDatamodules

F1

F2 F3 F1 F2

DM1

DM2

DM3

Personnel Inventory

Assignment of Fragments to Datamodules

Page 13: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

13

Data Distribution

Advantages of fragments as units of distribution.

Very flexible in size and definition.Distribution choices are largely independent of logical design.

Page 14: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

14

System Considerations

Reliable NetworkPipelining

Logical Data ItemsDatabase Operations: Read

WriteTransactions: Read Set

Write SetAtomic – “All or Nothing”

Effect

Page 15: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

15

System Considerations (cont’d)Each site in the DDBMS has one or both of the following software modules:

Transaction Manager (TM)Data Manager (DM)

TM’sRead, Parse, and Optimize user queriesHandle all interface with the user

DM’sMaintain physical databasePerform actual reads and writes

Page 16: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

16

System Considerations (cont’d)

TM

DMTM

TM DM

DMTransaction

TransactionTransaction

Transaction

Data

Data

Data

TM’s communication only with DM’sDM’s communication only with TM’s

Page 17: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

17

Transaction ExecutionTransaction TM’s Action.

Begin Set up temporary workspace.

Read (X) Select a DM which stores X,Send a message to this DM requesting X,Place X in workspace.

Read (X) No Action necessaryX is already in workspace.

Write (X) Change the value of X.

Read (X) No action necessary.

End Send a pre-commit to each DM that stores a copy of X,Await acknowledgements,Send commit message

Page 18: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

18

Optimal File Allocation In A Distributed Database System

Given a number of computers that process common information files, how can we:

allocate the files optimally so that the allocation yields minimum overall operating costs (storage and communication)?meet access time requirements for each file?not exceed the storage capacity of each computer?

Note: A File may be viewed as a segment.

Page 19: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

19

System Parametersn Computersm Files

Size of each fileUsage distribution for each file at each computerFrequency of modification of each file at each computer during usageAccess time requirement for each file at each computer

Storage capacity of each computer.Cost of storage per unit file length per computer.Cost of transmission per unit file length per

second per pair of computers.

Page 20: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

20

ModelCOSTS

Total Cost = Storage Costs + Transmission Costs

TC= CS + CT

Transmission Costs = Costs for Retrievals + Cost for Updates

CT = CTR + CTU

CONSTRAINTS

Each file must be stored in at least one computer.The storage capacity of each computer must not be exceeded.The probability of exceeding the required access time for each file must be less than a specified bound.

Page 21: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

21

Mathematical Representation Model

Page 22: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

22

Page 23: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

23

Page 24: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

24

Page 25: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

25

Page 26: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

26

Page 27: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

27

Transmission Paths Between Each Pair of

Computers

Page 28: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

28

Page 29: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

29

Reliability ConstraintAssuming processors and channels each have identical

reliability,ap = availability of the processorac = availability of the channelrj = # of redundant copies of the jth fileAj = Availability of the jth fileAj= ap [1 - (1 - acap)rj

For example ap = 0.98, ac = 0.99, thenAj = 0.951 for rj = 1Aj = 0.979 for rj = 2

Page 30: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

30

Page 31: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

31

File Directory for Distributed Databases

Page 32: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

32

To Other NodesTransaction Manager

Directory Manager

Database Manager

DDBMS

User Transactio

n

Database

Directory

Fragment

Overview of the Directory Manager

Legend

High-Level Request

Standard Database Call

Physical Access Call

Non-Local Request

Page 33: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

33

Content of Directory

Global descriptionFragmentation descriptionAllocation descriptionMappings to local namesAccess method descriptionStatistics on the databaseConsistency information

Page 34: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

34

Content of a Directory SystemPhysical (Static)Location (Site, Copy #, Disk, Page);

Creator;

Creation Date;

Version of the File Size;

Code Format;

Date of Last Update;

Logical (Dynamic)File Status (R, W)

Number of Backlog Jobs;

Site Availability;

Resource Requirement;Processing Cost;Communication Cost;Translation Cost;

Security(File, User, C);

C=Read/Write;

Read Only;

Write Only;

OperationCompression ratio (Logical Operation Query Data Value);

Query Access Optimizer;

Statistical Data Gathering;

Protocols

Page 35: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

35

The Functional Objectives ofIntegrated Dictionary/Directory

To support the control of data resourcesMaintaining data independence, security, and integrity

To support applications developmentOffering standardized data definitions and usage characteristicsEstablished program entities, DDL

To provide independence of directory data elements

Different hardware and software environmentsChanges in these environments

Page 36: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

36

Possible Data Types In IDDData names, definitions, formats and sizes.

Integrity constraints, authorization tables, and usage statistics for transaction management.Schemas and sub-schemas.

Description of standardized transactions and reports.

Characteristics of hardware, such as processors, lines, and terminals.

Description of users.

The IDD must support the maintenance of relationships between various entities such as:Associations between

Authorization tables and data,Users and transactionsReports

The IDD supplies version control

Page 37: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

37

Entity EntityRelationship

Attribute Attribute Attribute

Attribute Attribute Attribute

Figure 1

Page 38: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

38

Contains

Relationship Created 820708

Social Security Number

Entity Created 820114

Payroll Record

Maximum Length 400 Characters

Entity Created 820519

Comments Length9 Characters

Figure 2

Page 39: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

39

Schema Model LevelTypical

Meta-Entity-Types

Schema LevelTypical

Entity-Types, Relationship-Types,and

Attribute-Types

DictionaryLevelTypical

Entities, Relationships, and Attributes

Entity-Type

Element

Record

Document

Social-Security-NumberAgency-Name

Employee RecordPayroll Record

Form 1040FIPS Guideline

Relationship-Type Record-Contains-Element

Payroll-Record-Contains-Employee-Name

Table 1

Length

CreatorAttribute-Type

9 Characters

ADP Division

Page 40: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

40

Classes of Directory

Centralized Directory

Single Master DirectoryExtended Centralized DirectoryMultiple Master Directory

Local Directory

Distributed Directory

Page 41: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

41

Page 42: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

42

Page 43: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

43

Page 44: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

44

Page 45: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

45

Page 46: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

46

Causes For Directory UpdateChanging the description or structure of the user database.Moving user database entities from one node to another.Changing the description of a user or node.Changing a user view.Changing a network node’s status.

Page 47: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

47

Specific Drawbacks with Globally Replicated Directories

1) Additional remote activity to maintain directory coherence.

2) Difficulty of posting directory changes to a down site.

3) Difficulty of integrating a new site.

4) Storage of directory entries where they are not referenced.

5) Blurred responsibility for maintaining the directory.

Page 48: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

48

Performance Measure

Operating Cost/Unit Time = Communication Cost(Query+Update)

+Storage Cost + Code Translation Cost(Query+Update)

Response Time

Page 49: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

49

Operating Cost for the Centralized Directory System

Page 50: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

50

Page 51: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

51

Cost Trade-offs of Directory SystemsAssume

Communication cost much greater than storage costNo Translation costAll computers have same directory update rate

Then the cost trade-off point is at directory update rate.

P(C,EC) = 2/(N – 1) P(C,D) = 2/(N – 1) P(L,D) = 1

Page 52: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

52

Page 53: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

53

Type

Centralized

Extended Centralized

Multiple Master

Distributed Master

Localized

Description

Single Master directory

Advantages

SimplicityEase of updateReduces transmission costs and delays

Reduces transmission costs and delaysFall-soft CharacteristicsFast Response

Simple update procedure

Disadvantages

Transmission costs and delays

Coordinating updates of local directoriesKnowledge of appended directories

Storage requirementsCoordinating update of redundant copies

Storage costsTransmission costs for updates to the directoryTransmission costs for non-local queries

Variation of the centralized case in which the directory information is permanently appended in the local node once it is obtained from the master directoryVariation of the centralized case in which redundant copies of the master directory existMaster at every node

Local directory at each node without replication

Directory Design Alternatives

Page 54: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

54

Distributed Ingres Dictionary/Directory Contain Four Types of Data:

Relation name and location

Information for parsing queries(domain names, formats, etc.)

Performance information(number of tuples, storage structures, etc.)

Consistency information(protection, integrity constraints, etc. Does not include control data for concurrency control and synchronization)

Page 55: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

55

SDD-1 Dictionary/DirectoryThe directory itself is defined and maintained like any other user data. It can be logically fragmented, distributed, and replicated across the distributed DBMS’s.A directory locator (a small highly static file of directory fragment locations) is kept at every site and is used by the TMs and DMs to plan and control transactions and to help ensure DB integrity and consistency across concurrent accesses of data elements.The transaction modules are capable of caching remotely accessed directory data for subsequent usage. This facility is provided on the presumption that DB operations will exhibit the locality-of-reference characteristic.

Page 56: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

56

Vpatient : Patient ClassnameSSNagepatID{report}

PatientDB1nameSSNage

PatientDB2nameSSNpatID

PatReportDB2patIDreport

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 17: Pictorial diagram showing usefulness of

keys.

Page 57: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

57

name

sex

age

ssn

job

personDB1namesexagessn

personDB2namegenderssnjob

Figure 15: Pictorial diagram showing correspondence between virtual and real attributes.

Vperson : PersonClass

V person

People

Virtual Collection

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Character_to_String

Character_to_StringLargePositiveInteger_to_String

Page 58: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

58

Vretiree:retireClassnameincome

Vincome: incomeClassstockAmoun

tpension

financeDB1 name

stockAmount

financeDB2 name

pension

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 18: Pictorial diagram for aggregation.

Page 59: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

59

Vname: nameClassfirst

middle

last

personDB1

name

getfirstgetmiddle

getlast

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 19: Pictorial diagram of computed attribute.

Page 60: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

60

Vretiree:retireClass

name

incom

e

financeDB1 name

stockAmount

financeDB2 name

pension

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 20: Pictorial diagram of computed attribute.

1

2

Page 61: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

61

Vinsurance:insuranceClass

name

{insuranceAmoun

ts}

carInsuranceDB1carOwneramount

houseInsuranceDB2houseOnwe

ramount

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 21: Pictorial diagram showing grouping.

Page 62: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

62

Vpatient : patientClass

name

{doctors}

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 22: Pictorial diagram showing relationship.

Vdoctors : doctorClass

name

docID

salarypatientDB1

namesalary

patientDB1namedocID

patientDB2namephysician

patientDB1namedocID

(key)

(pointer)

relationship

Page 63: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

63

VtreatedBy : treatedByClass

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 23: Pictorial diagram showing a named

relationship.

Vpatient : PatientClass

.

.

.

patientDB1

namedocIDamountOwed

patientdoctoramountOwed

(key)(key)

Vdoctor : DoctorClass...

Page 64: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

64

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

VpersonPatient : personClassname

Vpatient : patientClasspatID

amount

VpersonDoctor : personClassname

Vdoctor : DoctorClassdocIDsalary

patientDB1name

SSNpayment

namedocIDsalary

doctorDB2

Figure 24: Pictorial diagram showing relationship.

Vpatientpatient

Vdoctordoctorperson

VpersonPatient

VpersonDoctor

Virtual collections

Page 65: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

65

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 30: Derivation of Virtual Entity Vconcept.

ConceptSemTypeconceptID

semTypeID

ConceptconceptIDtermIDstringTypestringIDstringVal

VconceptconceptIDsemType{termSet} Vterm

termID{stringSet}

VstringstringNamestringIDstringType

(key)

Page 66: 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network

66

DsemType

IDnamedefinition{relatedTo}

DsemRelate

relNamesemNamestatus

SemTypeDef

IDnamedefinition

SemTypeRel

name1relname2status

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 31: Derivation of Virtual Entity VsemType.