data placement problems in database applications
DESCRIPTION
Data Placement Problems in Database Applications. An Zhu Stanford University. Data Placement. Data objects Multiple disks Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations. Outline. Multimedia Systems [GKKTZ 00] - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/1.jpg)
Data Placement Problems in Database Applications
An ZhuAn Zhu
Stanford University
![Page 2: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/2.jpg)
04/19/23 AZ 2
Data Placement
Data objects Multiple disks Assignment of objects to disks
Optimize performance Optimize I/O Handle dynamic situations
![Page 3: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/3.jpg)
04/19/23 AZ 3
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
![Page 4: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/4.jpg)
04/19/23 AZ 4
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
![Page 5: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/5.jpg)
04/19/23 AZ 5
Multimedia Storage Systems
Movie objects Clients/subscribers Parallel disks
Limited storage: # of movies—Nj
Limited bandwidth: # of clients—Cj
Homogeneous system: Nj=k, Cj=L, j Uniform ratio: Cj/Nj=r, j
![Page 6: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/6.jpg)
04/19/23 AZ 6
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Total Storage: 12, Total Capacity: 1800
![Page 7: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/7.jpg)
04/19/23 AZ 7
An Example
000/600
000/600
400/600
100100
100 400400100100100
Total Storage: 12, Total Capacity: 1800
![Page 8: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/8.jpg)
04/19/23 AZ 8
An Example
400/600
000/600
400/600
400400100100
Total Storage: 12, Total Capacity: 1800
![Page 9: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/9.jpg)
04/19/23 AZ 9
Not All Clients Can be Satisfied
400/600
600/600
400/600
400
Total Satisfied Clients: 1400/1800=7/9
![Page 10: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/10.jpg)
04/19/23 AZ 10
Sliding Window Algorithm
Consider one disk at a time Maintain an ordered list of movies The first consecutive k movies (or less)
with at least L combined clients Assign the first L clients to the disk and
reconsider leftover clients
![Page 11: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/11.jpg)
04/19/23 AZ 11
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
100
![Page 12: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/12.jpg)
04/19/23 AZ 12
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
200
![Page 13: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/13.jpg)
04/19/23 AZ 13
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
400
![Page 14: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/14.jpg)
04/19/23 AZ 14
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
400
![Page 15: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/15.jpg)
04/19/23 AZ 15
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
700
![Page 16: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/16.jpg)
04/19/23 AZ 16
An Example
000/600
000/600
600/600
100 100100100100100
100 400100 0 0 0
Max window size k=4
![Page 17: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/17.jpg)
04/19/23 AZ 17
An Example
000/600
000/600
600/600
100 100100100100100
100 400100
Max window size k=4
![Page 18: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/18.jpg)
04/19/23 AZ 18
An Example
600/600
000/600
600/600
100 100100100100100
Max window size k=4
400
![Page 19: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/19.jpg)
04/19/23 AZ 19
An Example
600/600
400/600
600/600
100 100
Total Satisfied Clients: 1600/1800=8/9
![Page 20: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/20.jpg)
04/19/23 AZ 20
Theoretical Bounds
Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients
Translates to an -approximation
PTAS: (1+)-approximation, >0
21
11
k
21
11
k
![Page 21: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/21.jpg)
04/19/23 AZ 21
Theoretical Bounds
Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients
Translates to an -approximation
PTAS: (1+)-approximation, >0
21
11
k
21
11
k
![Page 22: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/22.jpg)
04/19/23 AZ 22
Proof Sketch
Load vs. storage saturated: ML, MS
Least loaded disk: cL ML+MS=M, 0<c<1 All remaining movies each have no
more than cL/k clients Initial instance is feasible (w.l.o.g.)
![Page 23: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/23.jpg)
04/19/23 AZ 23
An Example
600/600
400/600
600/600
100 100
Total Satisfied Clients: 1600/1800=8/9
ML=2, MS=1,
c=400/600
cL/k=100
![Page 24: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/24.jpg)
04/19/23 AZ 24
Proof Outline
If there is a load saturated disk with less than k movies All clients are satisfied
Otherwise At most ML movies are left Satisfy at least fraction of
the clients 21
11
k
![Page 25: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/25.jpg)
04/19/23 AZ 25
Lemma
If any of the load saturated disk has less than k objects
Any k-1 remaining movies in the list has L clients or more
![Page 26: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/26.jpg)
04/19/23 AZ 26
Lemma
The remaining disks are all load saturated
So, all clients are satisfied
At least LAt least L
![Page 27: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/27.jpg)
04/19/23 AZ 27
Otherwise…
Each disk has exactly k movies Total assigned movies: M·k
Initial movies: N M·k “New” movies generated: ML
# of movies left: ≤ ML
# of clients/remaining movie: ≤ cL/k Total # of remaining clients: cLML/k
![Page 28: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/28.jpg)
04/19/23 AZ 28
Otherwise…
Total clients: ≤ M·L Assigned clients: ML·L + Ms·cL Total # of remaining clients : ≤ Ms·(1-c)L Final bound:
cLMLM
LcMkMcL
S
U
SL
SL
)1(,/min
21
11
k
![Page 29: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/29.jpg)
04/19/23 AZ 29
Simulation Results
M=5L=100N=M·k
Zipf with=0.0( i-1 )
![Page 30: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/30.jpg)
04/19/23 AZ 30
Recap
The problem is NP-complete PTAS: best possible approximation
bound : best possible absolute bound
Sliding window algorithm: practical with O((M+N)log(M+N)) running time
21
11
k
![Page 31: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/31.jpg)
04/19/23 AZ 31
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the total I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
![Page 32: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/32.jpg)
04/19/23 AZ 32
Relational Databases
Objects: indexes, tables, views Multiple disks Minimize the total I/O access time
![Page 33: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/33.jpg)
04/19/23 AZ 33
Past Work
Full striping Split uniformly across all available disks Utilize I/O parallelism
: transfer rate
200MB=0.05s/MB,Tt=10s200MB
![Page 34: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/34.jpg)
04/19/23 AZ 34
=0.05s/MB,Tt=2.5s
Past Work
Full striping Split uniformly across all available disks Utilize I/O parallelism
: transfer rate
200MB =0.05s/MB,Tt=10s50MB50MB50MB
50MB50MB 50MB 50MB
![Page 35: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/35.jpg)
04/19/23 AZ 35
Past Work
Co-accessed objects with Random I/O Seek time/per block size: 0.01s/0.1MB Seek rate: =0.1s/MB Smaller object dominates
50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB
AB
Ts=50·2=10s
![Page 36: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/36.jpg)
04/19/23 AZ 36
Past Work
Combined access time Transfer time: Tt=(50+100)·=7.5s Seek time: Ts=min(50,100)·=10s Combined time: Tt+Ts=17.5s
50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB
AB
![Page 37: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/37.jpg)
04/19/23 AZ 37
Past Work
Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya 03’]
Combined time: 200·=10s
100MB 100MB200MB 200MB
![Page 38: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/38.jpg)
04/19/23 AZ 38
Data Layout Problem
Work Load (SQL DML) A set of queries and/or updates A set of co-accessed objects (pairwise) Access stats (pairwise) Minimize the estimated I/O access time
![Page 39: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/39.jpg)
04/19/23 AZ 39
Theoretical Questions
Approximation and its hardness Transfer time: P Seek time: Very Hard Combined time
Hard Minimizing transfer time alone is a “good”
approximation
![Page 40: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/40.jpg)
04/19/23 AZ 40
Transfer Time
Heterogeneous disks Different rate: j
Storage constraint: cj
Objects Different size: si
Access frequency: i,i’
Solvable using Linear Programming (LP)
![Page 41: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/41.jpg)
04/19/23 AZ 41
LP
',',',
,',',
,',,',
,
,
,
min
',,
,',,)(
,
,
,,0
iiiiii
jiiii
jjijijii
ji
ji
jiji
ji
T
iitT
jiixxt
jcx
isx
jix
Amount of object i assigned to disk j
Each object must be completely assigned
Each disk’s storage limit is kept
Transfer time for (i,i’) on disk j
Overall transfer time for (i,i’)
Minimize the total transfer time
![Page 42: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/42.jpg)
04/19/23 AZ 42
Seek Time
Hard even on disks with no storage constraint
Integral assignment Each object is assigned to one machine
only Conversion from a fraction assignment
with no loss
![Page 43: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/43.jpg)
04/19/23 AZ 43
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 Want: each file is spread uniformly
across a subset of disks
100MB 100MB200MB 200MB
150MB100MB
A B ABC C
![Page 44: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/44.jpg)
04/19/23 AZ 44
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002+1252
100MB 100MB200MB 200MB
125MB125MB
A B ABC C
![Page 45: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/45.jpg)
04/19/23 AZ 45
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002
100MB 100MB200MB 200MB
125MB125MB250MB
A B ABC C
![Page 46: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/46.jpg)
04/19/23 AZ 46
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 0 Each file resides on only one disk
200MB400MB250MB
100MB 100MB200MB200MB250MB
A B ABC C
![Page 47: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/47.jpg)
04/19/23 AZ 47
Implications
A polynomial time algorithm Equivalent to Minimum Edge Deletion
k-Partition NP-Hard to approximate: O(n2) Forces combined time be hard to
approximate
![Page 48: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/48.jpg)
04/19/23 AZ 48
Combined Time
Let
Hard to approximate: ·, 1>>0 Optimize transfer time alone gives 1+
j
jj
max
)(),min(2)(
))(1(),min(2)(
212121
212121
xxxxxx
xxxxxx
![Page 49: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/49.jpg)
04/19/23 AZ 49
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
![Page 50: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/50.jpg)
04/19/23 AZ 50
Load Rebalancing
Access pattern changes Initial layout no longer balanced
2
1
54
3 6
7
8
9
1011
MAX LOAD
![Page 51: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/51.jpg)
04/19/23 AZ 51
Load Rebalancing
Relocate objects Minimize the max load with k moves
2
1
543 6
78
9
1011
MAX LOAD
![Page 52: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/52.jpg)
04/19/23 AZ 52
Simple Algorithm (O(nlogn))
Step 1: Repeat k times Remove the largest object from the most
loaded disk The resulting max load: L(1)
Step2: Relocate the removed k objects Assign each object to the least loaded
disk The resulting max load: L(2)
![Page 53: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/53.jpg)
04/19/23 AZ 53
Example (k=3)
Step1: L(1) OPT
2
1
543 6
78
9
1011
MAX LOAD
9
MAX LOAD
1
6
L(1)
![Page 54: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/54.jpg)
04/19/23 AZ 54
Example (k=3)
Step2: L(2) OPT + S 2OPT Overall: max(L(1),L(2)) 2OPT
2 543
78
1011
91
6
MIN LOAD6 MIN LOAD
1 9
L(2)
![Page 55: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/55.jpg)
04/19/23 AZ 55
Can We Do Better?
Blindly remove the large object is not wise
2
1
543 6
78
9
1011
MAX LOAD
![Page 56: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/56.jpg)
04/19/23 AZ 56
How can we do better
Take care of large objects Large objects: size >1/2OPT Small objects: size 1/2OPT
2
1
543 6
78
9
1011
OPT
![Page 57: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/57.jpg)
04/19/23 AZ 57
Revising The Plan
Step 1: Repeat k times Remove the largest object from the most
loaded disk The resulting max load: L(1) OPT
Step2: Relocate the removed k objects Assign each object to the least loaded
disk The resulting max load: L(2) OPT +S
2OPT
![Page 58: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/58.jpg)
04/19/23 AZ 58
Revised Plan
Step 1: with no more than k moves Shuffle large objects and remove small
objects The resulting max load: L(1) 3/2 OPT
Step2: Relocate the removed objects Assign each object to the least loaded
disk (they are all small) The resulting max load: L(2) OPT +S
3/2 OPT just to fill in the space
![Page 59: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/59.jpg)
04/19/23 AZ 59
Example
Step 1
2
1
543 6
78
9
1011
MAX LOAD
2
1
10 11
3/2 OPT
![Page 60: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/60.jpg)
04/19/23 AZ 60
2
Example
Step 2
543 6
78
9 MIN LOAD
2
1
10 11
11 MIN LOAD10 MIN LOAD
OPT+S
![Page 61: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/61.jpg)
04/19/23 AZ 61
Recap
Fast 1.5-approximation (O(nlogn)) NP-complete PTAS: generalized cost
![Page 62: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/62.jpg)
04/19/23 AZ 62
Summary
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
![Page 63: Data Placement Problems in Database Applications](https://reader030.vdocuments.net/reader030/viewer/2022032606/56812e14550346895d93807b/html5/thumbnails/63.jpg)
04/19/23 AZ 63
Other Research Interests
Algorithms for mobile, sensor networks and privacy preserving databases
Online Algorithms: queue management, packet switching, web caching, scheduling
Approximation Algorithms: network design, multi-product pricing
Streaming Algorithms