cake: enabling high-level slos on shared storage systemsacs.ict.ac.cn/storage/slides/cake.pdfcake:...
TRANSCRIPT
![Page 1: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/1.jpg)
Cake: Enabling High-level SLOs on Shared Storage Systems
Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica
University of California, Berkeley
SOCC 2012
![Page 2: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/2.jpg)
2
Introduction
Problem And Challenge
Solutions
System Design
Implementation
Evaluation
Conclusion
Future work
Content
![Page 3: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/3.jpg)
Introduction
Rich web applications
A single slow storage request can dominate the
overall response time
High percentile latency SLOs
Deal with the latency present at the 95th or
99th percentile
![Page 4: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/4.jpg)
4
Introduction
Datacenter applications
Latency-sensitive
Throughput-oriented
Accessing distributed storage systems
Applications don’t share storage systems
Service-level objectives on throughput or latency
![Page 5: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/5.jpg)
5
Introduction
SLOs
Reflect the performance expectations
Amazon, Google, and Microsoft have identified
SLO as a major cause of user dissatisfaction
For example
A web client might require a 99th percentile
latency SLO of 100ms
A batch job might require a throughput SLO of
100 scan requests per second
![Page 6: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/6.jpg)
6
Problem And Challenge
Physically separating storage systems
Need Individual peak load
Segregation of data leads to degraded user
experience
Operational complexity
Require additional maintenance staff
More software bugs and configuration errors
![Page 7: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/7.jpg)
7
Problem And Challenge
Focusing solely on controlling disk-level resources
High-level storage SLOs require consideration of
resources beyond the disk
Disconnect between the high-level SLOs and
performance parameters like MB/s
Require tedious, manual translation
More programmer or system operator
![Page 8: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/8.jpg)
8
Solutions
Cake
A coordinated, multi-resource
schedule for shared distributed storage
environments with the goal of achieving
both high throughput and bounded
latency.
![Page 9: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/9.jpg)
9
Architecture
System Design
![Page 10: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/10.jpg)
10
System Design
First-level schedulers as a client
Provide mechanisms for differentiated
scheduling
Split large requests into smaller chunks
Limit the number of outstanding device requests
![Page 11: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/11.jpg)
11
System Design
Cake’s second-level scheduler as a
feedback loop
While attempting to increase utilization
Continually adjusts resource allocation at each
of the first-level schedulers
Maximize SLO compliance of the system
![Page 12: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/12.jpg)
12
First-level Resource Scheduling
Differentiated scheduling
a b
![Page 13: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/13.jpg)
13
First-level Resource SchedulingSplit large requests
Control number of outstanding requests
c d
![Page 14: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/14.jpg)
14
Second-level Scheduling
Multi-resource Request Lifecycle
Request processing in a storage system
involves far more than just accessing disk
Necessitating a coordinated, multi-resource
approach to scheduling
![Page 15: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/15.jpg)
15
Second-level Scheduling
Multi-resource Request Lifecycle
![Page 16: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/16.jpg)
16
Second-level Scheduling
High-level SLO Enforcement Cake’s second-level scheduler
Satisfy the latency requirements of latency-sensitive front-end clients
Maximize the throughput of throughput-oriented batch clients
Two phases of second level scheduling decisions For disk in the SLO compliance-based phase
For non-disk resources in the queue occupancy-based phase
![Page 17: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/17.jpg)
17
Second-level Scheduling
The initial SLO compliance-based phase
Decide on disk allocations based on client performance
The queue occupancy-based phase
Balance allocation in the rest of the system to keep the
disk utilized and improve overall performance
![Page 18: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/18.jpg)
18
Implementation
Chunking Large Requests
![Page 19: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/19.jpg)
19
Implementation
Number of Outstanding Requests
![Page 20: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/20.jpg)
20
Implementation
Cake Second-level Scheduler — SLO
Compliance-based Scheduling
![Page 21: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/21.jpg)
21
Implementation
Cake Second-level Scheduler — Queue
Occupancy-based Scheduling
![Page 22: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/22.jpg)
22
Evaluation
Proportional Shares and Reservations
When the front-end client is sending low throughput, reservations are an
effective way of reducing queue time at HDFS
![Page 23: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/23.jpg)
23
Evaluation
Proportional Shares and Reservations
When the front-end is sending high throughput,proportional share
is an effective mechanism at reducing latency
![Page 24: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/24.jpg)
24
Evaluation
Single vs Multi-resource Scheduling
CPU contention within HBase when running many concurrent threads
and without separate queues and differentiated scheduling
![Page 25: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/25.jpg)
25
Evaluation
Single vs. Multi-resource Scheduling
Thread-per-request displays greatly increased latency with chunked
request sizes
![Page 26: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/26.jpg)
26
Evaluation
Convergence Time
Diurnal Workload
Spike Workload
Latency Throughput Trade-off
Quantifying Benefits of Consolidation
![Page 27: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/27.jpg)
27
Conclusion
Coordinating resource allocation across
multiple software layers
Allowing application programmers to specify
high-level SLOs directly to the storage
Allowing consolidation of latency-sensitive
and throughput-oriented workloads
![Page 28: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/28.jpg)
28
Conclusion
Allowing users to flexibly move within the
storage latency vs. throughput trade-off by
choosing different high-level SLOs
Using Cake has concrete economic and
business advantages
![Page 29: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/29.jpg)
29
Future work
SLO admission control
Influence of DRAM and SSDs
Composable application-level SLOs
Automatic parameter tuning
Generalization to multiple SLOs
![Page 30: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,](https://reader033.vdocuments.net/reader033/viewer/2022051805/5ff426052b3aae7657160986/html5/thumbnails/30.jpg)
30
Thank You