: tiering storage for data analytics in the cloudyuecheng/docs/hpdc15_talk.pdf · 2017-08-21 · 10...
TRANSCRIPT
: Tiering Storage for Data Analytics in the Cloud Yue Cheng★, M. Safdar Iqbal★, Aayush Gupta†, Ali R. Butt★
Virginia Tech★, IBM Research – Almaden†
2
Cloud enables cost-efficient data analytics
Cloud infrastructure
Google Cloud BigTable
Google BigQuery
Amazon EMR
Microsoft Azure HDInsight
Sahara
3
Cloud storage enables data analytics in the cloud
Cloud infrastructure
Google Cloud BigTable
Google BigQuery
Amazon EMR
Microsoft Azure HDInsight
Sahara
4
Vast variety of cloud storage services
5
Vast variety of cloud storage services
Object storage
6
Vast variety of cloud storage services
Object storage
Network-attached block storage
7
Vast variety of cloud storage services
Object storage
Network-attached block storage
VM-local ephemeral storage
8
Heterogeneity in cloud storage services
Storage type
Capacity (GB/volume)
Throughput (MB/sec)
IOPS (4KB)
Cost ($/month)
ephSSD 375 733 100000 0.218×375
persSSD 100 48 3000 0.17×100
250 118 7500 0.17×250
500 234 15000 0.17×500
persHDD 100 20 150 0.04×100
250 45 375 0.04×250
500 97 750 0.04×500
objStore N/A 265 550 0.026/GB
ephSSD: VM-local ephemeral SSD, persSSD: Network-attached persistent SSD, persHDD: Network-attached persistent HDD, objStore: Google cloud object storage
9
Heterogeneity in cloud storage services
Storage type
Capacity (GB/volume)
Throughput (MB/sec)
IOPS (4KB)
Cost ($/month)
ephSSD 375 733 100000 0.218×375
persSSD 100 48 3000 0.17×100
250 118 7500 0.17×250
500 234 15000 0.17×500
persHDD 100 20 150 0.04×100
250 45 375 0.04×250
500 97 750 0.04×500
objStore N/A 265 550 0.026/GB
ephSSD: VM-local ephemeral SSD, persSSD: network-attached persistent SSD, persHDD: network-attached persistent HDD, objStore: Google cloud object storage ephSSD offers best performance w/o data persistence.
10
Heterogeneity in cloud storage services
Storage type
Capacity (GB/volume)
Throughput (MB/sec)
IOPS (4KB)
Cost ($/month)
ephSSD 375 733 100000 0.218×375
persSSD 100 48 3000 0.17×100
250 118 7500 0.17×250
500 234 15000 0.17×500
persHDD 100 20 150 0.04×100
250 45 375 0.04×250
500 97 750 0.04×500
objStore N/A 265 550 0.026/GB
ephSSD: VM-local ephemeral SSD, persSSD: network-attached persistent SSD, persHDD: network-attached persistent HDD, objStore: Google cloud object storage Performance of the network-attached block storage depends on
the size of the volume.
11
Heterogeneity in cloud storage services
Storage type
Capacity (GB/volume)
Throughput (MB/sec)
IOPS (4KB)
Cost ($/month)
ephSSD 375 733 100000 0.218×375
persSSD 100 48 3000 0.17×100
250 118 7500 0.17×250
500 234 15000 0.17×500
persHDD 100 20 150 0.04×100
250 45 375 0.04×250
500 97 750 0.04×500
objStore N/A 265 550 0.026/GB
ephSSD: VM-local ephemeral SSD, persSSD: network-attached persistent SSD, persHDD: network-attached persistent HDD, objStore: Google cloud object storage objStore provides the cheapest service and offers comparable
sequential throughput compared to that of a 500GB persSSD.
12
Heterogeneity in data analytics jobs
Application I/O-intensive CPU-intensive
Map Shuffle Reduce
Sort ✗ ✔ ✗ ✗
Join ✗ ✔ ✔ ✗
Grep ✔ ✗ ✗ ✗
KMeans ✗ ✗ ✗ ✔
13
Decision paralysis
Cloud tenant
How do I get the MOST BANG-for-the-
buck?
Highly heterogeneous cloud storage
services
Highly heterogeneous
analytics workloads
14
Motivation
¤ A need for a comprehensive experimental analysis ¤ To study the analytics-job to cloud-storage
relationships
¤ How to exploit heterogeneity in cloud storage and analytics workloads ¤ To reduce $ cost ¤ To improve performance ¤ To meet the deadline
15
Outline
Motivation
Quantitative analysis
CAST design
Evaluation
16
Outline
Motivation
Quantitative analysis
CAST design
Evaluation
17
Experimental study methodology
¤ Experiments on Google Cloud ¤ One n1-standard-16 VM (16 vCPUs, 60GB RAM)
Application I/O-intensive CPU-intensive
Map Shuffle Reduce
Sort ✗ ✔ ✗ ✗
Join ✗ ✔ ✔ ✗
Grep ✔ ✗ ✗ ✗
KMeans ✗ ✗ ✗ ✔
18
Experimental study methodology
¤ Experiments on Google Cloud ¤ One n1-standard-16 VM (16 vCPUs, 60GB RAM)
¤ Application granularity
¤ Workload granularity
Application I/O-intensive CPU-intensive
Map Shuffle Reduce
Sort ✗ ✔ ✗ ✗
Join ✗ ✔ ✔ ✗
Grep ✔ ✗ ✗ ✗
KMeans ✗ ✗ ✗ ✔
19
Application granularity: Performance
0
500
1000
1500
2000
2500
3000
Runt
ime
(se
c)
0
200
400
600
800
1000
1200
1400
1600
0
200
400
600
800
1000
1200
1400
1600
1800
0
200
400
600
800
1000
1200
1400
1600
Sort Join Grep KMeans Input download Output upload Processing
20
Application granularity: Performance
0
500
1000
1500
2000
2500
3000
Runt
ime
(se
c)
0
200
400
600
800
1000
1200
1400
1600
0
200
400
600
800
1000
1200
1400
1600
1800
0
200
400
600
800
1000
1200
1400
1600
Sort Join Grep KMeans Input download Output upload Processing
No storage service provides the best raw performance
21
Application granularity: Tenant utility
0
0.2
0.4
0.6
0.8
1
1.2
0
500
1000
1500
2000
2500
3000
Runt
ime
(se
c)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
200
400
600
800
1000
1200
1400
1600
0
0.5
1
1.5
2
2.5
3
0
200
400
600
800
1000
1200
1400
1600
1800
0
0.5
1
1.5
2
2.5
3
0
200
400
600
800
1000
1200
1400
1600
No
rma
lize
d te
nant
util
ity
Tenant utility = 1T$
Sort Join Grep KMeans Input download Output upload Processing Tenant utility
Performance
$ cost
22
Application granularity: Tenant utility
0
0.2
0.4
0.6
0.8
1
1.2
0
500
1000
1500
2000
2500
3000
Runt
ime
(se
c)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
200
400
600
800
1000
1200
1400
1600
0
0.5
1
1.5
2
2.5
3
0
200
400
600
800
1000
1200
1400
1600
1800
0
0.5
1
1.5
2
2.5
3
0
200
400
600
800
1000
1200
1400
1600
No
rma
lize
d te
nant
util
ity
Sort Join Grep KMeans Input download Output upload Processing Tenant utility
Slower storage, in some case, may provide higher utility & comparable performance
23
Workload granularity: Data reuse
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No
rma
lize
d te
nant
util
ity
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
Sort Join Grep KMeans No reuse Reuse-lifetime (1-hr) Reuse-lifetime (1-week)
Tenant utility = 1T$
Performance
$ cost
24
Workload granularity: Data reuse
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No
rma
lize
d te
nant
util
ity
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
Sort Join Grep KMeans No reuse Reuse-lifetime (1-hr) Reuse-lifetime (1-week)
Data reuse patterns affect data placement choices
25
Workload granularity: Inter dependency
26
Workload granularity: Inter dependency
2
2.2
2.4
2.6
2.8
3
3.2
7000 8000 9000 10000 11000 12000
Co
st (
$)
Total runtime (sec)
objStore
objStore objStore
objStore
deadline
objStore
27
Workload granularity: Inter dependency
2
2.2
2.4
2.6
2.8
3
3.2
7000 8000 9000 10000 11000 12000
Co
st (
$)
Total runtime (sec)
persSSD
persSSD
persSSD
persSSD
deadline
objStore persSSD
28
Workload granularity: Inter dependency
2
2.2
2.4
2.6
2.8
3
3.2
7000 8000 9000 10000 11000 12000
Co
st (
$)
Total runtime (sec)
objStore
objStore ephSSD
ephSSD
objStore
deadline
objStore persSSD
objStore+ephSSD
29
Workload granularity: Inter dependency
2
2.2
2.4
2.6
2.8
3
3.2
7000 8000 9000 10000 11000 12000
Co
st (
$)
Total runtime (sec)
objStore
objStore ephSSD
persSSD
deadline
objStore persSSD
objStore+ephSSD
objStore+ephSSD+persSSD
30
Workload granularity: Inter dependency
2
2.2
2.4
2.6
2.8
3
3.2
7000 8000 9000 10000 11000 12000
Co
st (
$)
Total runtime (sec)
objStore
objStore ephSSD
persSSD
deadline
objStore persSSD
objStore+ephSSD
objStore+ephSSD+persSSD
Complex inter-job dependencies require rethinking about use of multiple storage services
31
CAST: Cloud Analytics Storage Tiering
Different cloud storage services
Heterogeneity in analytics workloads
...
Different application characteristics
Data reuse across jobs Inter-job dependency
...
CAST exploits
32
CAST: Cloud Analytics Storage Tiering
Different cloud storage services
...
Different application characteristics
Data reuse across jobs Inter-job dependency
...
CAST
Heterogeneity in analytics workloads
exploits
33
Outline
Motivation
Quantitative analysis
CAST design
Evaluation
34
CAST framework
Job performance estimator
Tiering solver
Job profiles
Analytics workload spec’s
Job-storage relationships
Cloud service spec’s
Tenant goals
Frameworks
Dryad ...
...
35
CAST framework
Job performance estimator
Tiering solver
Job profiles
Analytics workload spec’s
Job-storage relationships
Cloud service spec’s
Tenant goals
Frameworks
Dryad ...
...
36
CAST framework
Job performance estimator
Tiering solver
Job profiles
Analytics workload spec’s
Job-storage relationships
Cloud service spec’s
J0: <S0,C0> J1: <S1,C1>
...
objStore persSSD ephSSD persHDD
Generated job assignment tuples
Tier the cloud storage, execute the workload
Frameworks
Dryad ...
Tenant goals
37
Job performance estimator Job performance estimator
Tiering solver
EST R̂, M̂ (si,Li )( ) = mnvm ⋅mc
"
##
$
%%⋅
inputim
bwmapsi
&
'
(((
)
*
++++
rnvm ⋅ rc
"
##
$
%%⋅interi
rbwshuffle
si
&
'
(((
)
*
+++
+r
nvm ⋅ rc
"
##
$
%%⋅
ouputir
bwreducesi
&
'
(((
)
*
+++
Map phase Shuffle phase
Reduce phase
# waves Runtime/wave
* MRCute: Bazaar [SoCC’12]
38
Tiering solver Job performance estimator
Tiering solver
¤ Optimization ¤ Objective function
max Tenant utility =1T
($vm +$store )
39
Tiering solver Job performance estimator
Tiering solver
¤ Optimization ¤ Objective function
¤ Constraints
=1T
($vm +$store )max Tenant utility
ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )
T = REG si,C[si ], R̂, L̂i( )i=1
J
∑ , where si ∈ F
$vm = nvm ⋅ (Pvm ⋅T )
$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#
$%)( )
f
F
∑
40
Tiering solver Job performance estimator
Tiering solver
¤ Optimization ¤ Objective function
¤ Constraints
=1T
($vm +$store )max Tenant utility
ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )
T = REG si,C[si ], R̂, L̂i( )i=1
J
∑ , where si ∈ F
$vm = nvm ⋅ (Pvm ⋅T )
$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#
$%)( )
f
F
∑
Space capacity constraint
41
Tiering solver Job performance estimator
Tiering solver
¤ Optimization ¤ Objective function
¤ Constraints
=1T
($vm +$store )max Tenant utility
ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )
T = REG si,C[si ], R̂, L̂i( )i=1
J
∑ , where si ∈ F
$vm = nvm ⋅ (Pvm ⋅T )
$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#
$%)( )
f
F
∑
Total workload runtime
42
Tiering solver Job performance estimator
Tiering solver
¤ Optimization ¤ Objective function
¤ Constraints
=1T
($vm +$store )max Tenant utility
ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )
T = REG si,C[si ], R̂, L̂i( )i=1
J
∑ , where si ∈ F
$vm = nvm ⋅ (Pvm ⋅T )
$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#
$%)( )
f
F
∑
VM $ cost
43
Tiering solver Job performance estimator
Tiering solver
¤ Optimization ¤ Objective function
¤ Constraints
=1T
($vm +$store )max Tenant utility
ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )
T = REG si,C[si ], R̂, L̂i( )i=1
J
∑ , where si ∈ F
$vm = nvm ⋅ (Pvm ⋅T )
$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#
$%)( )
f
F
∑ Storage $ cost
44
Tiering solver Job performance estimator
Tiering solver
¤ Optimization ¤ Objective function
¤ Constraints
=1T
($vm +$store )max Tenant utility
ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )
T = REG si,C[si ], R̂, L̂i( )i=1
J
∑ , where si ∈ F
$vm = nvm ⋅ (Pvm ⋅T )
$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#
$%)( )
f
F
∑
Tuning knob: Capacity of Ji Tuning knob:
Storage service of Ji
J0: <s0,c0> J1: <s1,c1> J2: <s2,c2> ...
Assigned job storage, adjusted storage capacity
Simulated annealing
45
Enhancements: CAST++
¤ Enhancement 1: Data reuse awareness ¤ All jobs sharing the same dataset have the same
storage service assigned to them
¤ Enhancement 2: Workflow awareness ¤ Objective
¤ Constraints
Depth-first traversal in workflow DAG for allocating storage capacities
min $total = $vm +$store
T ≤ deadline
46
Outline
Motivation
Quantitative analysis
CAST design
Evaluation
47
Methodology
¤ 400-core Hadoop cluster in Google Cloud ¤ 25 n1-standard-16 VM (16 vCPUs, 60GB RAM)
¤ Tenant utility measurement ¤ CAST: Effectiveness for general workloads
¤ CAST++: Effectiveness for data reuse
¤ Meeting workflow deadlines with CAST++
48
Tenant utility improvement
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST
CAST++
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No
rma
lize
d te
nant
util
ity
Deployment configuration
100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster
49
Tenant utility improvement
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST
CAST++
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No
rma
lize
d te
nant
util
ity
Deployment configuration
100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster
No tiering involved
50
Tenant utility improvement
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST
CAST++
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No
rma
lize
d te
nant
util
ity
Deployment configuration
100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster
Greedy algo’s
51
Tenant utility improvement
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST
CAST++
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No
rma
lize
d te
nant
util
ity
Deployment configuration
100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster
CAST tiering
52
Tenant utility improvement
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST
CAST++
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No
rma
lize
d te
nant
util
ity
Deployment configuration
100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster
14%
53
$ cost vs. runtime
0
50
100
150
200
250
300
350
400
450
0
50
100
150
200
250
300
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST CAST++
Runt
ime
(m
in)
Co
st (
$)
Cost Runtime
100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster
54
$ cost vs. runtime
0
50
100
150
200
250
300
350
400
450
0
50
100
150
200
250
300
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST CAST++
Runt
ime
(m
in)
Co
st (
$)
Cost Runtime
100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster
55
$ cost vs. runtime
0
50
100
150
200
250
300
350
400
450
0
50
100
150
200
250
300
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST CAST++
Runt
ime
(m
in)
Co
st (
$)
Cost Runtime
100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster
56
Meeting workflow deadlines
0
0.2
0.4
0.6
0.8
1
1.2
0
10
20
30
40
50
60
70
80
90
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
CAST CAST++
De
ad
line
mis
s ra
te
Co
st (
$)
Cost Deadline misses
A workload consisting of 5 workflows, with a total of 31 analytics jobs
57
Meeting workflow deadlines
0
0.2
0.4
0.6
0.8
1
1.2
0
10
20
30
40
50
60
70
80
90
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
CAST CAST++
De
ad
line
mis
s ra
te
Co
st (
$)
Cost Deadline misses
A workload consisting of 5 workflows, with a total of 31 analytics jobs
58
Meeting workflow deadlines
0
0.2
0.4
0.6
0.8
1
1.2
0
10
20
30
40
50
60
70
80
90
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
CAST CAST++
De
ad
line
mis
s ra
te
Co
st (
$)
Cost Deadline misses
A workload consisting of 5 workflows, with a total of 31 analytics jobs
59
Conclusion
¤ CAST performs storage allocation and data placement for cloud analytics workloads ¤ Leverages performance and pricing models of cloud
storage services ¤ Leverages analytics workload heterogeneity
¤ CAST++ enhancements detect data reuse and inter-job dependencies ¤ To further improve tenant utility
¤ To effectively meet deadlines while minimizing $ cost
60
http://research.cs.vt.edu/dssl/
Yue Cheng M. Iqbal Safdar Aayush Gupta Ali R. Butt
61
62
Backup Slides
63
Heterogeneity in cloud storage services
Storage type
Capacity (GB/volume)
Throughput (MB/sec)
IOPS (4KB)
Cost ($/month)
ephSSD 375 733 100000 0.218×375
persSSD 100 48 3000 0.17×100
250 118 7500 0.17×250
500 234 15000 0.17×500
persHDD 100 20 150 0.04×100
250 45 375 0.04×250
500 97 750 0.04×500
objStore N/A 265 550 0.026/GB
ephSSD: VM-local ephemeral SSD, persSSD: network-attached persistent SSD, persHDD: network-attached persistent HDD, objStore: Google cloud object storage A 500GB persSSD provides 1.4X higher throughput & 19X higher
IOPS than a 500GB persHDD.
64
Straggler issue in fine-grained tiering
0%
100%
200%
300%
400%
500%
ephSSD 100% persSSD 100% persHDD 100% ephSSD 50%persSSD 50%
ephSSD 50%persHDD 50%
Nor
mal
ized
runt
ime
0%
100%
200%
300%
400%
500%
0% 30% 70% 90% 100%
Nor
mal
ized
runt
ime
%-age data stored in ephSSD
persHDD 100% ephSSD 30%persHDD 70%
ephSSD 70%persHDD 30%
ephSSD 90%persHDD 10%
ephSSD 100%
65
Tiering solver Job performance estimator
Tiering solver
¤ Optimization ¤ Objective function
¤ Constraints
=1T
($vm +$store )max Tenant utility
ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )
T = REG si,C[si ], R̂, L̂i( )i=1
J
∑ , where si ∈ F
$vm = nvm ⋅ (Pvm ⋅T )
$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#
$%)( )
f
F
∑
Tuning knob: capacity of Ji
Input size of Ji
Intermediate data size of Ji
Output size of Ji
Tuning knob: Storage service of Ji
66
Hadoop traces from Facebook
¤ More than 99% of data touched by large jobs that incur most of the storage cost
¤ The aggregated data size for small jobs is only 0.1% of the total dataset size
¤ We focus on large jobs that have enough # mappers & reducers to fully utilize the cluster computing capacity
67
Storage capacity/service breakdown
0
0.2
0.4
0.6
0.8
1
1.2
ephSSD 100%
persSSD 100%
persHDD 100%
objStore 100%
Greedy exact-fit
Greedy over-prov
CAST CAST++
No
rma
lize
d c
ap
ac
ity
ephSSD persSSD persHDD objStore