5. distributed database design
DESCRIPTION
5. Distributed Database Design. Chapter 3 Distributed Database Design. Outline. Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分) Allocation (片段分配). Outline. Introduction Fragmentation (片段划分) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/1.jpg)
1
5. Distributed Database Design
Chapter 3
Distributed Database Design
![Page 2: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/2.jpg)
2
Outline
Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分)
Allocation (片段分配)
![Page 3: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/3.jpg)
3
Outline
Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分)
Allocation (片段分配)
![Page 4: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/4.jpg)
4
Distributed Database Design
The design of a distributed computer system involves making decisions on the placement of data and programs across the sites of a computer network.
In distributed DBMSs, such placement involves two things: Placement of the DDBMS software Placement of the applications that run on the database
The course concentrates on distribution of data. The distribution of DDBMS and applications are given a
priori.
![Page 5: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/5.jpg)
5
Framework of Distribution
Behavior of access pattern
DynamicStatic
Data sharing
Data + program sharing
PartialInformation
CompleteInformation
Level of knowledgeon access pattern behavior
Level of sharing
No sharing
![Page 6: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/6.jpg)
6
Design Strategies
Top-Down Mostly in designing systems from scratch
Mostly in designing homogeneous systems
Bottom-up When the databases already exist at a number of sites
Combining both
![Page 7: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/7.jpg)
7
Top-Down Design Process
Requirement Analysis
View DesignConceptual Design
Distribution Design
Physical Design
Objectives
Access Information ES’sGCS
LCS’s
LIS’s
User Input
User Input User Input
View Integration
![Page 8: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/8.jpg)
8
Distribution Design Issues
Fragmentation Why fragmentation at all?
How should we fragment?
How much should we fragment?
How to test correctness of decomposition?
Allocation
Necessary information required for fragmentation
and allocation
![Page 9: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/9.jpg)
9
Why Fragment?
Can’t we just distribute relations? What is a reasonable unit of distribution? Relation
– Views are subsets of relations locality
– Extra communication cost
Fragments of relations (sub-relations)
![Page 10: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/10.jpg)
10
Fragmentation
Unit of distribution = unit of data application accesses
Reduce irrelevant data access Facilitate intra-query concurrency over different fragments Can be used with other performance enhancing methods
(e.g., indexing and clustering) Applications have conflicting requirements, making disjoint
fragmentation a very hard problem Multiple fragment access requires join or union Semantic data control (integrity enforcement) could be very
costly
![Page 11: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/11.jpg)
11
About fragmentation
How should we fragment? Vertical Fragments sub grouping of attributes
Horizontal Fragments sub grouping of tuples
Mixed/Hybrid Fragments combination of above two
How much to fragment? Too little too much of irrelevant data access
Too much too much processing cost
Need to find suitable level of fragmentation
![Page 12: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/12.jpg)
12
Correctness Criteria Completeness no loss of data Decomposition of relation R into fragments R1, R2, …, Rn is
complete if and only if each data item in R can also be found in some Ri .
Reconstruction If relation R is decomposed into fragments R1, R2, …, Rn then there
should exist some relational operator such that R = 1 i n Ri
Disjointness If relation R is decomposed into fragments R1, R2, …, Rn and data
item di is in Rj, then di should not be in any other fragment Rk (k!=j).
![Page 13: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/13.jpg)
13
Allocation Alternatives
Full Replication Each fragment resides at each site
Partial Replication Each fragment resides at some of the sites
Not-replicated (Partitioned) Each fragment resides at only one site
Q & A:
What are the advantages and disadvantages?
![Page 14: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/14.jpg)
14
Allocation Alternatives
Full-Replication Partial -replication PartitioningQueryProcessing
Easy
DirectoryManagement
Easy ornon-existent
ConcurrencyControl
Moderate Difficult Easy
Reliability High High Low
RealityPossible
applicationRealistic
Possibleapplication
Same Difficulty
Same Difficulty
![Page 15: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/15.jpg)
15
Rule of Thumb for Allocation
If number-of-read-only-queries is more than number-of-update-queries,
then replication is advantageous, otherwise replication may cause problems.
![Page 16: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/16.jpg)
16
Information Requirements
Four categories Database information
Application information
Communication network information
Computer system information
![Page 17: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/17.jpg)
17
Outline
Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分)
Allocation (片段分配)
![Page 18: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/18.jpg)
18
Fragmentation
Horizontal fragmentation (HF) Primary horizontal fragmentation (PHF)
– based on predicates accessing the relation
Derived horizontal fragmentation (DHF)– based on predicates being defined on another logically related
relation
We shall first study algorithm for horizontal fragmentation, and then study issues related to derived horizontal fragmentation.
Vertical fragmentation (VF) Hybrid fragmentation (HVF)
![Page 19: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/19.jpg)
19
PHF Information Requirements
Title, Sal
PAY
Eno,Ename, Title
EMP
Jno, Jname, Budget. Loc
PROJ
Eno, Jno, Resp, Dur
ASG
L1
L2 L3
Database Information > links
> Cardinality of each relation: card (R)
![Page 20: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/20.jpg)
20
PHF Information Requirements (cont.)
Simple predicate: Given R(A1 , A2 , ..., An ), with each Ai
having domain of values dom(Ai), a simple predicate pj is
pj : Ai Value
where {<, >, , , } and Value dom(Ai).
Example Jname=“maintenance”Budget 200000
Follow ``80/20'' rule
Application Information 1
![Page 21: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/21.jpg)
21
minterm predicate: Given R and a set of simple predicates Pr = {p1, p2, ..., pm} on R, the set of minterm
predicates M = {m 1, m 2 , ..., m z } is defined as
M = {mi | mi = Pj Pr p*j }, 1 i z, 1 j z
where p*j = pj or ¬pj
m1: (Jname=“maintenance”) (Budget 200000)
m2: ¬ (Jname=“maintenance”) (Budget 200000)
m3: (Jname=“maintenance”) ¬ (Budget 200000)
m4: ¬ (Jname=“maintenance”) ¬ (Budget 200000)
Example
PHF Information Requirements (cont.)
Application Information 2
![Page 22: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/22.jpg)
22
PHF Information Requirements (cont.)
Minterm selectivity: sel (mi)
– number of tuples of the relation that would be accessed by a
user query specified according to a given minterm predicate.
Access frequency: acc (qi)
– frequency with which user applications access data. If Q = {q1 ,
q2 , ..., qn } is the set of queries, acc (qi ) indicates access
frequency of query qi in a given period.
Application Information 3
![Page 23: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/23.jpg)
23
Primary Horizontal Fragmentation
Each horizontal fragment Ri of relation R is defined by
Ri = Fi (R), 1 i w, where Fi is a selection formula,
which is (preferably) a minterm predicate.
A horizontal fragment Ri of relation R consists of all the
tuples of R which satisfy a minterm predicate mi.
Given a set of minterm predicates M, there are as
many as horizontal fragments of relation R as there
are minterm predicates.
Set of horizontal fragments also referred to as
minterm fragments
![Page 24: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/24.jpg)
24
PHF - Algorithm
Input: A relation R, a set of simple predicates Pr
Output: The set of fragments of R = {R1 , R2 , ..., Rw },
which obey the fragmentation rules.
Preliminaries:
Pr should be complete
Pr should be minimal
![Page 25: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/25.jpg)
25
Completeness of Simple Predicates
A set of simple predicates Pr is said to be
complete if and only if the accesses to the tuples of the minterm fragments defined on Pr requires
that two tuples of the same minterm fragment have the same probability of being accessed by any application.
![Page 26: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/26.jpg)
26
Completeness of Simple Predicates Example: PNO PNAME BUDGET LOC
P1 Instrumentation 150000 MntrealP2 Database 135000 New YorkP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 Paris
Applications:Q1: Find the projects at each locationQ2: Find projects with budget less than $200,000
Predicates: Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”} Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET 2000000, BUDGET >200000}
incomplete
![Page 27: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/27.jpg)
27
Minimality of Simple Predicates
If a predicate influences how fragmentation is performed, (i.e., causes a fragment f to be further fragmented into, say, fi and fj), then there should be
at least one application that accesses fi and fj
differently. In other words, the simple predicate should be
relevant in determining a fragmentation.
If all the predicates of a set Pr are relevant, then Pr is
minimal.
![Page 28: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/28.jpg)
28
Minimality of Simple Predicates
Example
Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET 2000000, BUDGET >200000, PNAME=“Instrumentation”}
Applications:Q1: Find the projects at each locationQ2: Find projects with budget less than $200,000
Minimal??
Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET 2000000, BUDGET >200000} minimal
![Page 29: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/29.jpg)
29
COM_MIN Algorithm
Input: A relation R and a set of simple predicates Pr
Output: A complete and minimal set of simple predicates Pr ’ for Pr
Rule 1: A relation or fragment is partitioned into at least two parts which are accessed differently by at least one application.
![Page 30: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/30.jpg)
30
COM_MIN Algorithm (cont.)
1. Initialization
• Find a pi Pr such that pi partitions R according to Rule 1
• Set Pr’ = pi ; Pr Pr - pi ; F fi
2. Iteratively add predicates to Pr ’ until it is complete
• Find a pj Pr such that pj partitions some fk defined according to
minterm predicate over Pr’ according to Rule 1
• Pr’ pj ; Pr = Pr - pj ; F fj
• If pk Pr ’ which is nonrelevant, then
Pr ’ = Pr ’ - pk ; F F - fk
![Page 31: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/31.jpg)
31
PHORIZONTAL Algorithm
Make use of COM_MIN to perform fragmentation
Input: A relation R and a set of simple predicates Pr
Output: A set of minterm predicates M according to which relation R is to be fragmented
Steps: Pr ’ COM_MIN(R, Pr) Determine the set M of minterm predicates Determine the set I of implications among pi Pr ’
Eliminate the contradictory minterms from M
![Page 32: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/32.jpg)
32
Contradictory Minterms
Given a minimal and complete set of simple predicates, containing n simple predicates
Not all the minterm fragments derived are valid A fragment can be self contradictory because of
implications among simple predicates.
Example: Dom(Sal): [10000, 200000]; Dom(Loc) = {HK,SF}
p1 : sal < 50000; p2 : Loc = HK; p3 : Loc = SF
Note: p2 (¬ p3 ); p3 (¬ p2 )
the minterm p1 p2 p3 is self contradictory.
![Page 33: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/33.jpg)
33
PHORIZONTAL Algorithm (cont.)
Input: relation R and a set of simple predicates Pr
Output: a set of minterm fragments M
begin
Pr’ = COM-MIN(R, Pr );
M = set of minterm predicates from Pr’
I = set of implications among pi Pr’
for each mi M
if mi is contradictory according to I then
M = M mi
end
![Page 34: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/34.jpg)
34
PHF Example: PAYTITLE SAL
Mech. Eng. 27000Programmer 24000Elec. Eng. 40000Syst. Anal. 34000
PAY
Application: Check the salary info and determine raiseEmployee records kept at two sites ==> application runs at two sites
Pr = {P1, P2}, which is complete and minimal Pr’ = PrMinterm predicates: m1: (SAL 30000); m2: (SAL > 30000)
TITLE SALElec. Eng. 40000Syst. Anal. 34000
PAY2
TITLE SALMech. Eng. 27000Programmer 24000
PAY1
Simple predicates: P1: SAL 30000, P2: SAL > 30000
![Page 35: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/35.jpg)
35
PHF Example: PROJ
Applications Find the name and budget of
projects given their locations – Issued at three sites
Access project information according to budget
– One site access 200000; other accesses >200000
p1: LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
p4: BUDGET 200000
p5 : BUDGET > 200000
m1: LOC = “Montreal” BUDGET 200000
m2 : LOC = “Montreal” BUDGET > 200000
m3 : LOC = “New York” BUDGET 200000
m4 : LOC = “New York” BUDGET > 200000
m5 : LOC = “Paris” BUDGET 200000
M6 : LOC = “Paris” BUDGET > 200000
PNO PNAME BUDGET LOCP1 Instrumentation 150000 MntrealP2 Database 135000 New YorkP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 Paris
PROJ
![Page 36: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/36.jpg)
36
PHF Example: PROJ - Result
PNO PNAME BUDGET LOCP1 Instrumentation 150000 MntrealP2 Database 135000 New YorkP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 Paris
PROJ
m1: LOC = “Montreal” BUDGET 200000
m2 : LOC = “Montreal” BUDGET > 200000
m3 : LOC = “New York” BUDGET 200000
m4 : LOC = “New York” BUDGET > 200000
m5 : LOC = “Paris” BUDGET 200000
M6 : LOC = “Paris” BUDGET > 200000
PNO PNAME BUDGET LOCP1 Instrumentation 150000 Mntreal
PROJ1
PNO PNAME BUDGET LOCP2 Database 135000 New York
PNO PNAME BUDGET LOCP3 CAD/CAM 250000 New York
PNO PNAME BUDGET LOCP4 Maintenance 310000 Paris
PROJ2
PROJ4
PROJ6
![Page 37: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/37.jpg)
37
PHF - Correctness
Completeness Since Pr is complete and minimal, the selection
predicates are complete
Reconstruction If relation R is fragmented into FR = {R1, R2, …, Rr}
R = Ri FR Ri
Disjointness Minterm predicates that form the basis of fragmentation
should be mutually exclusive
![Page 38: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/38.jpg)
38
DHF: Derived Horizontal Fragmentation
Title, Sal
PAY
Eno, Ename, Title
EMP
Jno, Jname, Budget. Loc
PROJ
Eno, Jno, Resp, Dur
ASG
L1
L2 L3
Each link is an equijoin
Owner relation
Member relation
DHF: Defined on a member relation according to a selection operation on its owner
![Page 39: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/39.jpg)
39
DHF – Example
TITLE SALMech. Eng. 27000Programmer 24000
PAY1= SAL 30000 PAY
ENO ENAME TITLEE1 J. Doe Elec.Eng.E2 M. Smith Syst. Anal.E3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE5 B. Casey Syst. Anal.E6 L. Chu Elec.Eng.E7 R. Davis Mech. Eng.E8 J. Jones Syst. Anal.
EMP ENO ENAME TITLEE3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE7 R. Davis Mech. Eng.
EMP1 = EMP PAY1
ENO ENAME TITLEE1 J. Doe Elec.Eng.E2 M. Smith Syst. Anal.E5 B. Casey Syst. Anal.E6 L. Chu Elec.Eng.E8 J. Jones Syst. Anal.
EMP2 = EMP PAY2
TITLE SALElec. Eng. 40000Syst. Anal. 34000
PAY2= SAL> 30000 PAY
DHF
Title, Sal
PAY
Eno, Ename, Title
EMP L1
![Page 40: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/40.jpg)
40
DHF: Derived Horizontal Fragmentation Let S be horizontally fragmented and let there be a
link L with owner(L) = S, and member(L) = R, the derived horizontal fragments of R are defined as
Ri = R Si , 1 i w
where Si is the horizontal fragment of S, is the semi join operator, and w is the maximum number of fragments
Inputs to derived horizontal fragmentation: partitions of owner relation member relation the semi join condition
The algorithm is straight forward.
![Page 41: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/41.jpg)
41
DHF: Correctness
Completeness primary horizontal fragmentation based on completeness
of selection predicates. For derived horizontal fragmentation based on referential integrity
Reconstruction Same as primary horizontal fragmentation (via union)
Disjointedness Simple join graphs between the owner and the member
fragments.
![Page 42: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/42.jpg)
42
DHF: Issues
Multiple owners for a member relation; how should we derived horizontally fragments of a member relation?
There could be a chain of derived horizontal fragmentation.
![Page 43: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/43.jpg)
43
Vertical Fragmentation (VF) Has been studied within the centralized context Vertical partitioning of a relation R produces
fragments R1, R2, ..., Rm, each of which contains a subset of R's attributes as well as the primary key of R
The object of vertical fragmentation is to reduce irrelevant attribute access, and thus irrelevant data access
“Optimal” vertical fragmentation is one that minimizes the irrelevant data access for user applications
![Page 44: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/44.jpg)
44
VF Two Approaches
Grouping: each individual attribute one fragment, at each step join some of the fragments until some criteria being satisfied Attributes to fragments
Splitting: start with global relation, and generate beneficial partitions based on access behavior of the applications Relations to fragments
![Page 45: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/45.jpg)
45
Vertical Fragmentation
Overlapping fragments grouping
Non-overlapping fragments Splitting
We do not consider the replicated key attributes to be overlapping Advantage
– easier to enforce functional dependencies (for integrity checking)
![Page 46: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/46.jpg)
46
VF Information Requirements Application Information
Attribute affinities– A measure that indicates how closely related the attributes are
– This is obtained from more primitive usage data
Attribute usage values– Given a set of queries Q = {q1 , q2, ..., qm} that will run on relation
R (A1, A2, ..., An)
use (qi, Aj ) = 1 if attribute Aj is referenced by query qi
0 otherwise
use (qi, . ) can be defined accordingly.
![Page 47: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/47.jpg)
47
VF Definition of use(qi, Aj)
Consider the following 4 queries for relation PROJ
q1: SELECT BUDGET FROM PROJ WHERE PNO = val;
q2: SELECT PNAME,BUDGET FROM PROJ;
q3: SELECT PNAME FROM PROJ WHERE LOC = val;
q4: SELECT SUM(BUDGET) FROM PROJ WHERE LOC=val;
Let A1= PNO, A2= PNAME, A3 = BUDGET, A4 = LOC
1100
1010
0110
0101
A1 A2 A3 A4
q1
q2
q3
q4
![Page 48: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/48.jpg)
48
VF - Affinity Measure aff(Ai, Aj)
The attribute affinity measure between two attributes Ai and Aj of a relation R (A1, A2, ..., An) with respect to
the set of applications Q = {q1 , q2, ..., qm} is defined as:
Aff (Ai, Aj) = { k | use (qk, Ai)=1 use (qk, Aj)=1} l refl (qk) * accl (qk)
where refl (qk) is the number of accesses to attributes for each execution of application qk at site l; accl (qk) is the access frequency of query qk at site l .
![Page 49: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/49.jpg)
49
VF - Affinity Measure aff(Ai, Aj)
Assume each query in the previous example accesses the attributes once during each execution.
Also assume the access frequency of query k at different sites is:
then Aff (A1, A3) = 15*1 + 20*1 + 10*1 = 45
and the attribute affinity matrix AA is:
003
252525
005
102015S1 S2 S3
q1
q2
q3
q4
783750
353545
755800
045045A1 A2 A3 A4
A1
A2
A3
A4
1100
1010
0110
0101
A1 A2 A3 A4
q1
q2
q3
q4
![Page 50: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/50.jpg)
50
AA Matrix for Vertical Fragmentation
This affinity matrix will be used to guide the fragmentation effort. The process involves first clustering together the attributes with high affinity for each other, and then splitting the relation accordingly.
![Page 51: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/51.jpg)
51
VF - Correctness
A relation R, defined over attribute set A and key K, generates vertical partitioning
FR = {R1 , R2 , … ,Rr} Completeness
Reconstruction
Disjointness Duplicate keys are not considered to be overlapping
AiRA
kR = Ri Ri FR
![Page 52: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/52.jpg)
52
Hybrid Fragmentation
R
R1 R2
R11 R12 R21 R22 R23
HF
VF VF
![Page 53: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/53.jpg)
53
Outline
Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分)
Allocation (片段分配)
![Page 54: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/54.jpg)
54
Allocation
File Allocation vs. Database Allocation Fragments are not individual files
– Relationships have to be maintained
Access to databases is more complicated– Remote file access model not applicable
– Relationship between allocation and query processing
Cost of integrity enforcement should be considered
Cost of concurrency control should be considered
![Page 55: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/55.jpg)
55
Allocation Problem
Assuming:
nFFFF ,,, 21 A set of fragments
mSSSS ,,, 21 A set of sites
qQQQQ ,,, 21 A set of applications
Problem
Find the optimal distribution of F over S
![Page 56: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/56.jpg)
56
Optimality with Two Aspects
Minimal costStoring Fi at Sj
Querying Fi at Sj
Updating Fi at all Sj 's with a copy of Fi
Communication
PerformanceResponse time
Throughput
……Separate the two issues to reduce its complexity.
![Page 57: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/57.jpg)
57
A Simple Formulation of the Cost Problem
For a single fragment Fi
R = {r1, r2, …, rm}
rj : read-only traffic generated at Sj for Fi
U = {u1, u2, …, um}
uj : update traffic generated at Sj for Fi
F, S, Q are defined as before
1
![Page 58: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/58.jpg)
58
Assume the communication cost between any pair of sites Si and Sj is fixed
C(T ) = {c11,c12, c13, …, c1,m, …, cm-1,m}
cij : retrieval communication cost
C’(U ) = {c’11,c’12, c’13, …, c’1,m, …, c’m-1,m}
c’ij : update communication cost
A Simple Formulation of the Cost Problem (cont.)
![Page 59: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/59.jpg)
59
D = {d1, d2, …, dm}
cost for storing Fi at Sj
No capacity constraints for sites and communication links
3
A Simple Formulation of the Cost Problem (cont.)
![Page 60: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/60.jpg)
60
Let
The allocation problem is a cost minimization problem for finding the set
I i.e., the sites to store the fragment Fi, where
otherwise0
site toassigned is fragment theif1 jiij
SFx
A Simple Formulation of the Cost Problem (cont.)
mSSS ,,, 21
![Page 61: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/61.jpg)
61
This formulation only considers one fragment Fi
at site Sj. It is NP-complete.
A Simple Formulation of the Cost Problem (cont.)
]dx)cminr'cux(min[
IS|jjj
m
1i
j,iIS|j
j
IS|jj,ijj
jj
j
i ijj jS I
u c'|
Updates :)(min }|{ cr ijIji s j
Reads :j
j jS Id
|
Storage :For queries from site i
Total cost
![Page 62: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/62.jpg)
62
A precise formulation must consider: All fragments together
How query is processed
The enforcement of integrity constraint
The cost of concurrency control and transaction control
A Precise Formulation of the Cost Problem
![Page 63: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/63.jpg)
63
Allocation Model in General Allocation Model
min (total Cost)
subject to – response time constraint
– storage constraint
– processing constraint
Decision variable
0
1xij
if fragment Fi is stored at site Sj
otherwise
![Page 64: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/64.jpg)
64
Allocation Model (cont.) Total Cost
∑all queries query processing cost +
∑all sites ∑all fragments cost of storing a fragment at a site
Storage Cost (on fragment Fj at site Sk)
(unit storage cost at Sk) * (size of Fj) * xjk
Query processing Cost (for one query)
processing component + transmission component
![Page 65: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/65.jpg)
65
Allocation Model (cont.) Query Processing Cost
Processing component
access cost + integrity enforcement cost + concurrency control cost
access cost: ∑all sites ∑all fragments (number of update accesses + number of read accesses) * xjk * local processing cost at site
integrity enforcement and concurrency control costs can be similarly calculated.
![Page 66: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/66.jpg)
66
Allocation Model (cont.) Query Processing Cost
Transmission component
cost of processing updates + cost of processing retrievals
Cost of updates∑all sites ∑all fragments update message cost + ∑all sites ∑all fragments acknowledgement cost
Retrieval costs
∑all fragments minall sites (cost of retrieval command + cost of sending back the result)
![Page 67: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/67.jpg)
67
Allocation Model (cont.)
Constraints Response time for each query not longer than maximally
allowed response time for that query
Storage constraint: The total size of all fragments allocated at a site must be less than the storage capacity at that site.
Processing constraint: The total processing load because of all queries at a site must be less than the processing capacity at that site.
![Page 68: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/68.jpg)
68
Solution Methods
NP-complete Heuristics approaches Exploring techniques developed in operational research
(运筹学 )
Reduce problem complexity ignore replication first, and then improve with a greedy
algorithm
![Page 69: 5. Distributed Database Design](https://reader033.vdocuments.net/reader033/viewer/2022051214/56813a07550346895da1d22f/html5/thumbnails/69.jpg)
69
Question & Answer