1 multidatabase and distributed manipulations part 1 witold litwin
Post on 19-Dec-2015
239 views
TRANSCRIPT
1
Multidatabase and Distributed Manipulations
Part 1
Multidatabase and Distributed Manipulations
Part 1
Witold Litwin
2
Plan
Introduction Technical problems Origin of the concept
– Approach : Centralized DB (ANSI-SPARC)
– Approach : DDB (top-down)
– Approach : Global Schema (bottom-up)
Reference architectures– Multidatabase architecture
– Federated architecture
Autonomy, semantic heterogeneity , common model
3
Multidatabase Model
Single database model ANSI-SPARC : – The real universe is modeled as one DB
The real universe is modeled as multiple dbs – Autonomous
– Semantically heterogeneous
– Manipulated through a multibase language "Multidatabase interoperability". Litwin, W. Abdellatif, A. Multidatabase
systems: An advanced solution for global information sharing. Hurson, A., R., Bright, M., W., Pakzad, S., H., (Ed.). IEEE press, 1993
4
Multidatabase Model
Cours & étudiants
Bibliothèque
Employés
Rest.
Mes-amisAutres BDs
surInternet
Paris 9Privé
Teletel
FolioCine
5
Problems
Reference architecture Semantic heterogeneity in presence of local
autonomy Common data model Fonctions of MDB langage Transactions Protocols & standards Performance
6
Reference Architectures
Multidatabase architecture– Generalization of the ANSI-SPARC
architecture – Federated dbs architecture – Generalization of the federated DB
architecture Others
7
ANSI-SPARC Architecture Centralized Integrated BD (1960-70)
CS
PS
ES ES
Internal Level
Ext. Level
Conc. level
Reel Univers
ES - External Schema
CS - Conceptual Schema
PS - Physical or Internal Schema
8
Distributed DB Origin of the concept (years 1970)
– WAN development ( 20 kb/s)
– Overload of centralized dbs
9
Ddb Idea : distribution of functions other than local
communication ("top-down" approach) Which ones ?
Distr. Execution (OS) File access The DB
Then, what data model for CS ? Network (codasyl) ? Relational
10
Relation Fragmentation
ParisLyon
Hotels (H#, Ville, Cat, #Chambres)
(H#, Ville) (H#, Cat, #Chambres)
1 Fragment
11
Problems
GS scalability GS utility for a local user Query performance (bad case) Data migration from ldbs
– IMS, IDMS, Socrate Local applications
12
Problemes With the "Bottom-up" Approach
Creation of GS Semantic heterogeneity Time for the integration / local
restructuring autonomy In other words: scalability of GS Updates Performance Heterogeneos views
Creation of GS Semantic heterogeneity Time for the integration / local
restructuring autonomy In other words: scalability of GS Updates Performance Heterogeneos views
CS CS CS
GS
ES ES
PS PS PS
GS Approach("bottom-up")
13
User may have data in multiple dabs ANSI-SPARC compatibles
One may face multiple css» In general, it will be impossible
to create a GS
User may have data in multiple dabs ANSI-SPARC compatibles
One may face multiple css» In general, it will be impossible
to create a GS
MDB Architecture Absence of (GS)
CS CS CS
GS
ES ES
PS PS PS
GS Approach("bottom-up")
14
MBD Architecture Concept of the MDB Language
Language for definition and manipulation of collections of dbs (multidatabases)
» Definition of MDB ESs Perhaps GSs
» Definition of MDB dependancies Semantics, integrity, security, manipulation...
» Formulation of MDB queries (explicitly) Referencing DB names With MDB joins...
Find in DB michelin and in DB gaumont all restaurants '**' and cinemas at the same street
Language for definition and manipulation of collections of dbs (multidatabases)
» Definition of MDB ESs Perhaps GSs
» Definition of MDB dependancies Semantics, integrity, security, manipulation...
» Formulation of MDB queries (explicitly) Referencing DB names With MDB joins...
Find in DB michelin and in DB gaumont all restaurants '**' and cinemas at the same street
15
MBD Architecture Multibases
A multidatabase (MDB) is a collection of dbs with MDB language– E.G. MSQL
A collection of dbs without MDB langage is not an MDB, but just a collection of dbs– As a collection of flat files (tables) without a
DB language , SQL for example, is not a DB
16
Potential Multibases
Cours & étudiants
Bibliothèque
Employés
Rest.
Mes-amisOther DBs
surInternet
Paris 9Privé
Teletel
FolioCine
17
MDB Architecture Concept of Internal Logical Sublayer
The legacy data models can be heterogeneous– Different SQL dialects– Relational, hierarchical, network– OO and object-relational
It is preferable to have one model at MDB layer– One needs a sub-layer for translations
Also local DBA may wish not to show some of the data at MDB layer
Solution: ILS - internal logical schema » Unknown of ANSI-SPARC» Called also gateway
18
Multibase Architecture Result of All This : (W. Litwin & Al, Années 1980)
PS
ES
CSDS DS
ILS
CS
PS
CS
PS
CS
PS
ES ES
MDB layers
Internal layer
Ext. MDB layer
CS Conc. MDB layer
Usagers
ESmultibase
Req.MDB
19
Federated Architecture (Hambiger & Mcleod, Années 1980)
Every DB should be autonomous In general, there will be no GS
– Global integration is against the autonomy
The dbs used in common should form a federation of autonomous dbs
Every DB in a federation should be provided with three schemes:– ES: export schema
– IS: import schema
– PS: private schema : for all the private data, of ES and of IS included
There should be some federation dictionary (FD)
Every DB should be autonomous In general, there will be no GS
– Global integration is against the autonomy
The dbs used in common should form a federation of autonomous dbs
Every DB in a federation should be provided with three schemes:– ES: export schema
– IS: import schema
– PS: private schema : for all the private data, of ES and of IS included
There should be some federation dictionary (FD)
20
Federated Architecture (Hambiger & Mcleod, Années 1980)
PS
ESIS
PSES
IS
PSES
IS
PSES
IS
Fig 3. Federated databases architecture
FD
21
Comparison
MDB architecture focuses on the concept of MDB language
Federated architecture focuses on the concept of autonomy– No MDB language
– But there is the notion of autonomy also in MDB arch.
MDB architecture is + decentralized– No equivalent of FD
– Several DSs
Both architectures are popular
22
Comparison MDB <-> Fed.MDB Arch. Fed.Arch.
Multidatabase Federation
Autonomy Autonomy
MDB Lang.
¬ GS ¬ GS
CS PS
MDB ES IS
DB ES ES
ILS ES
DSs Fed Dict.
23
DB Autonomy(local autonomy)
Capability to control local DB data by the DBA– Naming– Value Types– Data Structures – Physical Structures– Query Execution – Security– Priority to local queries
24
Multidatabase Autonomy
Capability to controle multiple DBs by a DBA
Same aspects as for the local autonomy– Naming...
May create a conflict with a DB autonomy
Priority to local autonomy
B1 B2 B3
25
Semantic Heterogeneity Differences in representations of the same real
properties Names André Andrew Value Types
– Representations– units of measure cm/s pied/h– precision 1 g 1 Kg
Data Structures One table in 2 NF several tables in 3 NF
26
Solutions (partial) Schemas + descriptive Protocols + descriptive Data Dictionaries Thesaurus Automatic Representation Conversions Automatic Unit Conversion Implicit Joints Higher level models and manip. languages
– IDL (Krishnamourthy, Litwin, Kent, ACM-Sigmod 92
27
Common Models Ext. Relational
– EDA-SQL– MSQL (research)– ODBC Microsoft SQL
Object-Relational– SQL 3
CCS language for inf. retr. DBs Numerous gateways towards SQL
– IMS SQL– Codasyl SQL
32
Conclusion
Modern DBMSs are in general MDBMSs– UniSQL/M, Oracle, Sybase, MsAccess, SQL
Server, InterBase, DB2... MDB access requires new functions for the
management of – autonomy– semantic heterogeneity – physical distribution of data
33
Conclusion technical solutions are basd on :
– new reference architectures » multidatabase architecture
» federated architecture
– common data models » relational and object-relational
– Gateways in rapid development » Any DBMS towards any other using ODBC, JDBC…
» Relational or XML wrappers
34
Conclusion
MDB Languages– MSQL et SQL-x ; x > 2
New transaction models Protocoles et Standards
– ODBC, JDBC…
All this is dealt with more in depth– later on– in the class books & several others
35
Exercises & Research Problems All these already in the text Difference between the concepts of a DB, DDB,MDB and FDB. What does it mean « reference architecture », as ANSI-SPARC for example ? Differences between the architectures: « top-down » « bottom-up », multibase et
federated. Comment on the actual architecture of federated DBs in DB2 V. 6 and later (see DB2
Help or RedBooks at the IBM web site). Comment on the actual architecture of federated DBs in SQL Server 2000 (see SQL
Server 2000 Help or the papers at the MS web site). Design SQL SELECT statements realizing the fragmentation of the Hotels DB. Can you propose then how to deal with the updates ? Comment on the concept of ILS, of gateway and of mediator What multidatabase common model is most used today ? Comment the concept of local autonomy (what, why, how) Provide examples of various types of semantic heterogeneity Prove that usual associability of equijoints does not hold if units of measure of
values to join may have different precisions What are consequences for relational DBMSs ? Propose an extension to SQL for units of measure and the corresponding query
optimization (PH.D Thesis).