distirbuted database introduction
TRANSCRIPT
-
8/9/2019 Distirbuted DataBase Introduction
1/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.1
Outline Introduction
What is a distributed DBMS
ProblemsCurrent state-of-affairs
Background Distributed DBMS Architecture
Distributed Database Design Semantic Data Control Distributed Query Processing Distributed Transaction Management Parallel Database Systems Distributed Object DBMS Database Interoperability
Current Issues
-
8/9/2019 Distirbuted DataBase Introduction
2/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.2
File Systems
program 1
data description 1
program 2
data description 2
program 3
data description 3
File 1
File 2
File 3
-
8/9/2019 Distirbuted DataBase Introduction
3/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.3
Database Management
database
DBMS
Applicationprogram 1
(with datasemantics)
Applicationprogram 2(with datasemantics)
Applicationprogram 3(with datasemantics)
description
manipulationcontrol
-
8/9/2019 Distirbuted DataBase Introduction
4/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.4
Motivation
DatabaseTechnology
ComputerNetworks
integration distribution
integration
integration centralization
Distributed
Database
Systems
-
8/9/2019 Distirbuted DataBase Introduction
5/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.5
Distributed Computing
I A concept in search of a definition and a name.
I A number of autonomous processing elements(not necessarily homogeneous) that areinterconnected by a computer network and thatcooperate in performing their assigned tasks.
-
8/9/2019 Distirbuted DataBase Introduction
6/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.6
I Synonymous termsdistributed function
distributed data processing
multiprocessors/multicomputers
satellite processing backend processing
dedicated/special purpose computers
timeshared systems
functionally modular systems
Distributed Computing
-
8/9/2019 Distirbuted DataBase Introduction
7/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.7
I Processing logic
I Functions
I Data
I Control
What is distributed
-
8/9/2019 Distirbuted DataBase Introduction
8/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.8
What is a Distributed Database
System?
A distributed database (DDB) is a collection of multiple,
logically interrelated databases distributed over acomputer network.
A distributed database management system (DDBMS)is the software that manages the DDB and provides anaccess mechanism that makes this distributiontransparent to the users.
Distributed database system (DDBS) = DDB + DDBMS
-
8/9/2019 Distirbuted DataBase Introduction
9/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.9
I A timesharing computer system
I A loosely or tightly coupled multiprocessor
system
I A database system which resides at one of thenodes of a network of computers - this is a
centralized database on a network node
What is not a DDBS?
-
8/9/2019 Distirbuted DataBase Introduction
10/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.10
Centralized DBMS on a Network
CommunicationNetwork
Site 5
Site 1
Site 2
Site 3Site 4
-
8/9/2019 Distirbuted DataBase Introduction
11/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.11
Distributed DBMS Environment
CommunicationNetwork
Site 5
Site 1
Site 2
Site 3Site 4
-
8/9/2019 Distirbuted DataBase Introduction
12/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.12
Implicit Assumptions
I Data stored at a number of sites each site
logically consists of a single processor.I Processors at different sites are interconnected
by a computer network no multiprocessorsparallel database systems
I Distributed database is a database, not acollection of files data logically related asexhibited in the users access patterns relational data model
I D-DBMS is a full-fledged DBMSnot remote file system, not a TP system
-
8/9/2019 Distirbuted DataBase Introduction
13/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.13
Shared-Memory Architecture
Examples : symmetric multiprocessors (Sequent,Encore) and some mainframes(IBM3090, Bull's DPS8)
P1
Pn
M
D
-
8/9/2019 Distirbuted DataBase Introduction
14/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.14
Shared-Disk Architecture
Examples : DEC's VAXcluster, IBM's IMS/VSData Sharing
DP1
M1
Pn
Mn
-
8/9/2019 Distirbuted DataBase Introduction
15/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.15
Shared-Nothing Architecture
Examples : Teradata's DBC, Tandem, Intel'sParagon, NCR's 3600 and 3700
P1
M1
D1Pn
Mn
Dn
-
8/9/2019 Distirbuted DataBase Introduction
16/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.16
I Manufacturing - especially multi-plant
manufacturingI Military command and control
I EFT
I Corporate MISI Airlines
I Hotel chains
I Any organization which has adecentralized organization structure
Applications
-
8/9/2019 Distirbuted DataBase Introduction
17/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.17
Distributed DBMS Promises
Transparent management of distributed,fragmented, and replicated data
Improved reliability/availability through
distributed transactions
Improved performance
Easier and more economical system expansion
-
8/9/2019 Distirbuted DataBase Introduction
18/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.18
TransparencyI Transparency is the separation of the higher level
semantics of a system from the lower levelimplementation issues.
I Fundamental issue is to providedata independence
in the distributed environment
Network (distribution) transparency
Replication transparency
Fragmentation transparencyN horizontal fragmentation: selectionN vertical fragmentation: projectionN hybrid
-
8/9/2019 Distirbuted DataBase Introduction
19/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.19
Example
TITLE SAL
PAY
Elect. Eng. 40000Syst. Anal. 34000Mech. Eng. 27000Programmer 24000
PROJ
PNO PNAME BUDGET
ENO ENAME TITLE
E1 J. Doe Elect. Eng.E2 M. Smith Syst. Anal.E3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE5 B. Casey Syst. Anal.E6 L. Chu Elect. Eng.E7 R. Davis Mech. Eng.
E8 J. Jones Syst. Anal.
EMP
ENO PNO RESP
E1 P1 Manager 12
DUR
E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24
E6 P4 Manager 48E7 P3 Engineer 36
E8 P3 Manager 40
ASG
P1 Instrumentation 150000
P3 CAD/CAM 250000P2 Database Develop. 135000
P4 Maintenance 310000
E7 P5 Engineer 23
-
8/9/2019 Distirbuted DataBase Introduction
20/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.20
Transparent Access
SELECT ENAME,SAL
FROM EMP,ASG,PAY
WHERE DUR > 12
AND EMP.ENO = ASG.ENO
AND PAY.TITLE = EMP.TITLE Paris projects
Paris employees
Paris assignments
Boston employees
Montreal projects
Paris projects
New York projects
with budget > 200000
Montreal employees
Montreal assignments
Boston
CommunicationNetwork
Montreal
Paris
NewYork
Boston projects
Boston employees
Boston assignments
Boston projects
New York employees
New York projects
New York assignments
Tokyo
-
8/9/2019 Distirbuted DataBase Introduction
21/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.21
Distributed Database - User View
Distributed Database
-
8/9/2019 Distirbuted DataBase Introduction
22/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.22
Distributed DBMS - Reality
CommunicationSubsystem
UserQuery
DBMS
Software
DBMSSoftware
UserApplication
DBMSSoftware
UserApplicationUser
Query
DBMS
Software
UserQuery
DBMSSoftware
-
8/9/2019 Distirbuted DataBase Introduction
23/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.23
Potentially Improved Performance
I Proximity of data to its points of useRequires some support for fragmentation and replication
I
Parallelism in execution Inter-query parallelism
Intra-query parallelism
-
8/9/2019 Distirbuted DataBase Introduction
24/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.24
Parallelism Requirements
I Have as much of the data required by eachapplication at the site where the applicationexecutes
Full replication
I How about updates?
Updates to replicated data requires implementation of
distributed concurrency control and commit protocols
-
8/9/2019 Distirbuted DataBase Introduction
25/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.25
System Expansion
I Issue is database scaling
I Emergence of microprocessor and workstationtechnologies
Demise of Grosh's law
Client-server model of computing
I Data communication cost vs telecommunicationcost
-
8/9/2019 Distirbuted DataBase Introduction
26/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.26
Distributed DBMS Issues
IDistributed Database Design
how to distribute the database replicated & non-replicated database distribution
a related problem in directory management
I Query Processing convert user transactions to data manipulation
instructions
optimization problem
min{cost = data transmission + local processing}
general formulation is NP-hard
-
8/9/2019 Distirbuted DataBase Introduction
27/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.27
Distributed DBMS Issues
I Concurrency Control synchronization of concurrent accesses
consistency and isolation of transactions' effects
deadlock management
I Reliability
how to make the system resilient to failures
atomicity and durability
-
8/9/2019 Distirbuted DataBase Introduction
28/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.28
DirectoryManagement
Relationship Between Issues
Reliability
DeadlockManagement
Query
Processing
Concurrency
Control
Distribution
Design
-
8/9/2019 Distirbuted DataBase Introduction
29/29
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.29
I
Operating System Support operating system with proper support for database
operations
dichotomy between general purpose processingrequirements and database processing requirements
I Open Systems and InteroperabilityDistributed Multidatabase Systems
More probable scenario
Parallel issues
Related Issues