distirbuted database introduction

Upload: gouse5815

Post on 29-May-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Distirbuted DataBase Introduction

    1/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.1

    Outline Introduction

    What is a distributed DBMS

    ProblemsCurrent state-of-affairs

    Background Distributed DBMS Architecture

    Distributed Database Design Semantic Data Control Distributed Query Processing Distributed Transaction Management Parallel Database Systems Distributed Object DBMS Database Interoperability

    Current Issues

  • 8/9/2019 Distirbuted DataBase Introduction

    2/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.2

    File Systems

    program 1

    data description 1

    program 2

    data description 2

    program 3

    data description 3

    File 1

    File 2

    File 3

  • 8/9/2019 Distirbuted DataBase Introduction

    3/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.3

    Database Management

    database

    DBMS

    Applicationprogram 1

    (with datasemantics)

    Applicationprogram 2(with datasemantics)

    Applicationprogram 3(with datasemantics)

    description

    manipulationcontrol

  • 8/9/2019 Distirbuted DataBase Introduction

    4/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.4

    Motivation

    DatabaseTechnology

    ComputerNetworks

    integration distribution

    integration

    integration centralization

    Distributed

    Database

    Systems

  • 8/9/2019 Distirbuted DataBase Introduction

    5/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.5

    Distributed Computing

    I A concept in search of a definition and a name.

    I A number of autonomous processing elements(not necessarily homogeneous) that areinterconnected by a computer network and thatcooperate in performing their assigned tasks.

  • 8/9/2019 Distirbuted DataBase Introduction

    6/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.6

    I Synonymous termsdistributed function

    distributed data processing

    multiprocessors/multicomputers

    satellite processing backend processing

    dedicated/special purpose computers

    timeshared systems

    functionally modular systems

    Distributed Computing

  • 8/9/2019 Distirbuted DataBase Introduction

    7/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.7

    I Processing logic

    I Functions

    I Data

    I Control

    What is distributed

  • 8/9/2019 Distirbuted DataBase Introduction

    8/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.8

    What is a Distributed Database

    System?

    A distributed database (DDB) is a collection of multiple,

    logically interrelated databases distributed over acomputer network.

    A distributed database management system (DDBMS)is the software that manages the DDB and provides anaccess mechanism that makes this distributiontransparent to the users.

    Distributed database system (DDBS) = DDB + DDBMS

  • 8/9/2019 Distirbuted DataBase Introduction

    9/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.9

    I A timesharing computer system

    I A loosely or tightly coupled multiprocessor

    system

    I A database system which resides at one of thenodes of a network of computers - this is a

    centralized database on a network node

    What is not a DDBS?

  • 8/9/2019 Distirbuted DataBase Introduction

    10/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.10

    Centralized DBMS on a Network

    CommunicationNetwork

    Site 5

    Site 1

    Site 2

    Site 3Site 4

  • 8/9/2019 Distirbuted DataBase Introduction

    11/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.11

    Distributed DBMS Environment

    CommunicationNetwork

    Site 5

    Site 1

    Site 2

    Site 3Site 4

  • 8/9/2019 Distirbuted DataBase Introduction

    12/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.12

    Implicit Assumptions

    I Data stored at a number of sites each site

    logically consists of a single processor.I Processors at different sites are interconnected

    by a computer network no multiprocessorsparallel database systems

    I Distributed database is a database, not acollection of files data logically related asexhibited in the users access patterns relational data model

    I D-DBMS is a full-fledged DBMSnot remote file system, not a TP system

  • 8/9/2019 Distirbuted DataBase Introduction

    13/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.13

    Shared-Memory Architecture

    Examples : symmetric multiprocessors (Sequent,Encore) and some mainframes(IBM3090, Bull's DPS8)

    P1

    Pn

    M

    D

  • 8/9/2019 Distirbuted DataBase Introduction

    14/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.14

    Shared-Disk Architecture

    Examples : DEC's VAXcluster, IBM's IMS/VSData Sharing

    DP1

    M1

    Pn

    Mn

  • 8/9/2019 Distirbuted DataBase Introduction

    15/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.15

    Shared-Nothing Architecture

    Examples : Teradata's DBC, Tandem, Intel'sParagon, NCR's 3600 and 3700

    P1

    M1

    D1Pn

    Mn

    Dn

  • 8/9/2019 Distirbuted DataBase Introduction

    16/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.16

    I Manufacturing - especially multi-plant

    manufacturingI Military command and control

    I EFT

    I Corporate MISI Airlines

    I Hotel chains

    I Any organization which has adecentralized organization structure

    Applications

  • 8/9/2019 Distirbuted DataBase Introduction

    17/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.17

    Distributed DBMS Promises

    Transparent management of distributed,fragmented, and replicated data

    Improved reliability/availability through

    distributed transactions

    Improved performance

    Easier and more economical system expansion

  • 8/9/2019 Distirbuted DataBase Introduction

    18/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.18

    TransparencyI Transparency is the separation of the higher level

    semantics of a system from the lower levelimplementation issues.

    I Fundamental issue is to providedata independence

    in the distributed environment

    Network (distribution) transparency

    Replication transparency

    Fragmentation transparencyN horizontal fragmentation: selectionN vertical fragmentation: projectionN hybrid

  • 8/9/2019 Distirbuted DataBase Introduction

    19/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.19

    Example

    TITLE SAL

    PAY

    Elect. Eng. 40000Syst. Anal. 34000Mech. Eng. 27000Programmer 24000

    PROJ

    PNO PNAME BUDGET

    ENO ENAME TITLE

    E1 J. Doe Elect. Eng.E2 M. Smith Syst. Anal.E3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE5 B. Casey Syst. Anal.E6 L. Chu Elect. Eng.E7 R. Davis Mech. Eng.

    E8 J. Jones Syst. Anal.

    EMP

    ENO PNO RESP

    E1 P1 Manager 12

    DUR

    E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24

    E6 P4 Manager 48E7 P3 Engineer 36

    E8 P3 Manager 40

    ASG

    P1 Instrumentation 150000

    P3 CAD/CAM 250000P2 Database Develop. 135000

    P4 Maintenance 310000

    E7 P5 Engineer 23

  • 8/9/2019 Distirbuted DataBase Introduction

    20/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.20

    Transparent Access

    SELECT ENAME,SAL

    FROM EMP,ASG,PAY

    WHERE DUR > 12

    AND EMP.ENO = ASG.ENO

    AND PAY.TITLE = EMP.TITLE Paris projects

    Paris employees

    Paris assignments

    Boston employees

    Montreal projects

    Paris projects

    New York projects

    with budget > 200000

    Montreal employees

    Montreal assignments

    Boston

    CommunicationNetwork

    Montreal

    Paris

    NewYork

    Boston projects

    Boston employees

    Boston assignments

    Boston projects

    New York employees

    New York projects

    New York assignments

    Tokyo

  • 8/9/2019 Distirbuted DataBase Introduction

    21/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.21

    Distributed Database - User View

    Distributed Database

  • 8/9/2019 Distirbuted DataBase Introduction

    22/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.22

    Distributed DBMS - Reality

    CommunicationSubsystem

    UserQuery

    DBMS

    Software

    DBMSSoftware

    UserApplication

    DBMSSoftware

    UserApplicationUser

    Query

    DBMS

    Software

    UserQuery

    DBMSSoftware

  • 8/9/2019 Distirbuted DataBase Introduction

    23/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.23

    Potentially Improved Performance

    I Proximity of data to its points of useRequires some support for fragmentation and replication

    I

    Parallelism in execution Inter-query parallelism

    Intra-query parallelism

  • 8/9/2019 Distirbuted DataBase Introduction

    24/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.24

    Parallelism Requirements

    I Have as much of the data required by eachapplication at the site where the applicationexecutes

    Full replication

    I How about updates?

    Updates to replicated data requires implementation of

    distributed concurrency control and commit protocols

  • 8/9/2019 Distirbuted DataBase Introduction

    25/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.25

    System Expansion

    I Issue is database scaling

    I Emergence of microprocessor and workstationtechnologies

    Demise of Grosh's law

    Client-server model of computing

    I Data communication cost vs telecommunicationcost

  • 8/9/2019 Distirbuted DataBase Introduction

    26/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.26

    Distributed DBMS Issues

    IDistributed Database Design

    how to distribute the database replicated & non-replicated database distribution

    a related problem in directory management

    I Query Processing convert user transactions to data manipulation

    instructions

    optimization problem

    min{cost = data transmission + local processing}

    general formulation is NP-hard

  • 8/9/2019 Distirbuted DataBase Introduction

    27/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.27

    Distributed DBMS Issues

    I Concurrency Control synchronization of concurrent accesses

    consistency and isolation of transactions' effects

    deadlock management

    I Reliability

    how to make the system resilient to failures

    atomicity and durability

  • 8/9/2019 Distirbuted DataBase Introduction

    28/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.28

    DirectoryManagement

    Relationship Between Issues

    Reliability

    DeadlockManagement

    Query

    Processing

    Concurrency

    Control

    Distribution

    Design

  • 8/9/2019 Distirbuted DataBase Introduction

    29/29

    Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 1.29

    I

    Operating System Support operating system with proper support for database

    operations

    dichotomy between general purpose processingrequirements and database processing requirements

    I Open Systems and InteroperabilityDistributed Multidatabase Systems

    More probable scenario

    Parallel issues

    Related Issues