15 distributed
DESCRIPTION
DatabasesTRANSCRIPT
![Page 1: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/1.jpg)
Distributed Database Management System (DDBS)
Motivation: Data is used at multiple distributed sites (e.g. Branch offices).
Communication between sites is ----- costly ----- potentially unreliable Solution: ----- Allow sites to store/maintain the data they use
most often/specialize in ----- Sharing with other sites/HQs if combinations of
data necessary
![Page 2: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/2.jpg)
Network Topology
A B
C F
E D
A
C F B
D E A B
C
D E
F
A B E
C D F
A B
E F
C
D
Fully connected network Partially connected network
Tree structured network
Star network
Ring network
Cost/reliability #of hops
![Page 3: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/3.jpg)
Tradeoffs between
1. keeping data in centralized headquarters: • simpler maintenance • simpler consistency enforcement • possibly more efficient if many updates,aggregate
computations 2. or distributed across branch offices: • lower communication cost • reliability • parallelism can be implemented locally
![Page 4: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/4.jpg)
• Interconnectivity of pre-existing DBs • Expandability (don’t need to replace whole system
to grow) • Cost (many small engines on PC’s cheaper than
mainframes) issue:communication costs vs. hardware computation costs.
• Performance (place data near where used) • Availability and reliability
Advantages of DDBS (heterogeneous)
![Page 5: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/5.jpg)
Complicating factors
• Maintaining data consistency (in face of replication and sharing)
• Distributed directory management (who controls mapping of data to sites)
• Security
• Heterogeneous Databases
different database architectures
![Page 6: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/6.jpg)
Distributed Database Design Issues
Options for storing a relation R across multiple sites:
• Replication (maintain copies/replicas of R on multiple sites)
• Fragmentation (Relation store in fragments/ pieces on multiple sites)
• combination of both
![Page 7: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/7.jpg)
R1
R2
Copy of R1 Copy of R1
Copy of R2 Copy of R2
site1 site2 site3 site4
1/3 of R1
1/3 of R1
1/3 of R1
1
2
3 4
5
6 7
8
9
½ Of R2
½ Of R2
A/B C/D
(Horizontal)
(Vertical)
FRAGMENTATION
REPLICATION
![Page 8: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/8.jpg)
Replication
• Issues: (whole database replication v.s. no replication)
- what to replicate?(all relations or only frequently user shared data)
- where to replicate? (function of communication costs,usage needs,resources)
- which relations to replicate? -” primary copy” of relation (simplifies consistency
enforcement, but where located?)
Non redundant allocation
![Page 9: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/9.jpg)
Replication (cont)
•Advantages:
–Improved availability (multiple sources for a relation if a site is down)
–Increased parallelism (sites can process (primarily) read- only operations in parallel,minimizing data transfer)
(well suited for read-only, majority read-only data access)
![Page 10: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/10.jpg)
Replication (cont)
• Disadvantages: o problems/overhead for writes/updates o costs of consistency enforcement
- updates propagated to all sites (communication costs)
- costs of synchronization/locking for consistency enforcement on update greater than in single source models.
Complicates concurrency and recover Replication inefficient in databases with frequent
updates
![Page 11: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/11.jpg)
FRAGMENTATION
• Vertical • Horizontal • mixel
Issues:
- completeness: Every tuple/attribute in some fragment
- reconstruction:easy way of reconstructing full relation
- transparency
![Page 12: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/12.jpg)
-Fragments contain subsets of complete tuples (all attributes at all sites)
How to reconstruct R=Rs1 Rs2 ……. Rsn
HORIZONTAL FRAGMENTATION
Original relation
A1 A2 ………. An 1
1
1
2
2
3
3
3
T1
T2
T3
.
.T60
T61
.
.
Tn
A1 A2 ………. An
A1 A2 ………. An T1
T2
T3
.
.T60
T61
.
.
Tn
Site 1
Site 2
![Page 13: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/13.jpg)
Horizontal Fragmentation
• Example Usefulness: - Each branch office maintains complete attribute set of its
employees (salary,benefits,address/phone,departments,projects,etc.)
- Site of Fragment easily determined by a key attribute
value -e.g. Branch_office*
![Page 14: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/14.jpg)
VERTICAL FRAGMENTATION
A1 A2 A3 A4
A1 A2 A3 A4
Original Relation (R) t1
t2
tn
RS1
RS2 t1
t2
tn
t1
t2
tn
SITE1 SITE2
How to Reconstruct:
R=Rs1 Rs2 Rsn
TID –Tuple ID Hidden Attribute to
ensure account and simple join reconstruction
RS1.TID=RS2.TID
Join condition
1 2 n
1 2 n
TID TID
![Page 15: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/15.jpg)
Example usefulness: Salary Office Benefits Office Directory (Name|address|phone|fax) Dependents Management Office each control their own appropriate attribute for all corporate branch offices VERTICAL –Attribute-centered management (keep all instances of an attribute in one place) HORIZONTAL – tuple/individual-centered management (keep all values of a tuple in one place)
VERTICAL FRAGMENTATION
![Page 16: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/16.jpg)
MIXED FRAGMENTATION
usa
Europe
A1 A2 A3
A1 A2 A3
A4 A5
A4 A5
A1 A2 A3 A4 A5
(Salary Attributes)
(Benefit Attributes)
Rs1
Rs2
Rs3
Rs4
R
![Page 17: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/17.jpg)
Partition of Attributes/tuples need not be disjoint
REPLICATION and FRAGMENTATION
A1 A2 A3 A4 A5
A1 A2 A3 A4 A2 A3 A4 A5
Overlap
(replication of attributes)
![Page 18: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/18.jpg)
TRANSPARENCY
Fragmentation Transparency -User doesn’t need to know mapping between relations and
fragmented subrelations Replication Transparency -User doesn’t need to know about existence or location of other
copies (treat as if single copy of DB) Location and Naming Transparency -Use shouldn’t need to know about location and full names
of data on the server
ΠSalary(σssn=so(Employee)) Site27, Employee. Fragment3. Replica7
Name Server Proper site, Fragment, replica for this data access
Unique name
![Page 19: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/19.jpg)
Issues1:
Parallel Processing across Fragments
ΠLName(σsalary>40,000(Employee))
⇒ ΠLName(σsalary>40,000(Emp1)) U ΠLName(σsalary>40,000(Emp2))
QUERY PROCESSING IN DDMS
=Emp1 U Emp2
2 Fragments
Site 1 Site 2
Execution in Parallel on fragments
and union results together
Horizontal fragmentations
![Page 20: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/20.jpg)
(A B) C A (B C) 50K 0.5K
1K 3K
0.5K
0.5K
Site1 Site2 Site3 50K 1K 3K Joins- symmetric and
associative
Parallel Processing
(σxx(A)) (B C)
![Page 21: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/21.jpg)
QUERY PROCESSING IN DDBS
R=Π Fnames, Cnames, Dnames (Employee Department) Strategies: 1)Ship both relations to the result site and join there 2)Ship employee to 2, join at 2, results to 3 3)Ship Department to 1, join at 1, results to 3 ⇒ minimize total communication cost of data transfer
1,003,000 bytes transfered
1,002,000 bytes transfered
5,000 bytes transfered
Join Strategies
Site 3 100 records, 2000 bytes
Site 1 10,000 records, 1,000,000 bytes
Site 2 100 records, 3000
bytes
Mg rssn to ssn
![Page 22: 15 Distributed](https://reader034.vdocuments.net/reader034/viewer/2022042717/563db9d3550346aa9aa0490a/html5/thumbnails/22.jpg)
-transaction managers / coordinators -log managers Problems: -failure of site -failure of link -loss of messages if server is down, elect new server ⇒what about network partitioning?
RECOVERY IN DDBS
Server’s link
Newly elected Server
Difficult to know which had occurred
Original Server