grem: dynamic ssd resource allocation in virtualized
TRANSCRIPT
Undergraduate/GraduateCategory:ComputerandInformationSciencesDegreeLevel:PhDCandidateinECEAbstractID#:1382
In a shared virtualized storage system that runs heterogeneous VMswith diverse IO demands, it becomes a problem for the hypervisorto cost-effectively partition and allocate SSD resources amongmultiple VMs. We design a Global SSD Resource Managementsolution - GREM, which aims to fully utilize SSD resources as“smart” cache under the consideration of performance isolation.
GREM:DynamicSSDResourceAllocationInVirtualizedStorageSystemsWithHeterogeneousVMsZhengyu Yang1,Jianzhe Tai2,Ningfang Mi1NortheasternUniversity1 VMwareInc2
VM1 VM2 VM3 VM4
Short-term Zone Long-term Zone
VM1 VM2 VM3 VM4
Short-term Zone Long-term Zone
VM1 VM2 VM3 VM4
Short-term Zone Long-term Zone
(a) GREM_EQ
(b) GREM
(c) D_GREM
Dynamical allocation for each VM
Performance Isolation Fair Competition
Dynamical allocation for each VM
Dynamical allocation for two zones
VM1 VM2 VM3 VM4
Long-term Zone (ZL) Short-term Zone (ZS)
Performance Isolation Fair Competition
CZL CZS
CT
0200400600800100012001400160018002000
1 35 69 103
137
171
205
239
273
307
341
375
409
443
477
511
545
579
613
647
681
715
749
783
817
851
885
919
953
987
1021
1055
1089
1123
1157
1191
1225
1259
1293
1327
1361
1395
1429
1463
1497
1531
1565
1599
1633
1667
1701
1735
1769
1803
1837
1871
1905
1939
1973
2007
0200400600800100012001400160018002000
1 32 63 94 125
156
187
218
249
280
311
342
373
404
435
466
497
528
559
590
621
652
683
714
745
776
807
838
869
900
931
962
993
1024
1055
1086
1117
1148
1179
1210
1241
1272
1303
1334
1365
1396
1427
1458
1489
1520
1551
1582
1613
1644
1675
1706
1737
1768
1799
1830
1861
1892
1923
1954
1985
(a) mds0
Wor
king
Set
Siz
e (M
B)
100 250 400 550 700 850 1000 1150 1300 1450 1600 1750 1900
Epoch (5min)
(b) src12
Wor
king
Set
Siz
e (M
B)
100 250 400 550 700 850 1000 1150 1300 1450 1600 1750 1900
Epoch (5min)
(a) MSR–F1
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000VM
Allo
catio
n in
Lon
g Te
rm Z
one
(%)
Epoch (5min)
(b) MSR-U
Epoch (5min)
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000VM
Allo
catio
n in
Lon
g Te
rm Z
one
(%)
Epoch (5min)
(a) MSR-F1
Epoch (5min)
mds0 src12 stg0 usr0
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000VM
Allo
catio
n in
Lon
g Te
rm Z
one
(%)
Epoch (5min)
(c) MSR-F1U
Epoch (5min)
mds0 src12 stg0 usr0 stg1 usr2 web2 src21
100
80
60
40
20
0
VM
Occ
upan
cy R
atio
in Z
L (%
)
stg1 usr2 web2 src21
(b) MSR–U
100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)
100
80
60
40
20
0
VM
Occ
upan
cy R
atio
in Z
L (%
)
100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)
(c) MSR–F1U
100
80
60
40
20
0
VM
Occ
upan
cy R
atio
in Z
L (%
)
100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)
Bursty
Workloads Change Cache Hit
Bursty Detector
AggressiveAllocation
ConservativeAdjustmentOperations
Non-bursty
Feedbacks
Next Epoch
Global−LRU vFRM GlobalLRU vFRM Global−vFRM GlobalD_GREMvFRM Global
0 10 20 30 40 50 60 70 80 90
100 110 120
4 8 10 12 16 32
Nor
mal
ized
IO C
ost (
%)
Cache Size (GB)
(a) F1U(d) FIU–F1U
86.38% 91.60% 92.08% 93.41% 94.47% 100.00%
0 10 20 30 40 50 60 70 80 90
100 110 120
4 8 10 12 16 32
Nor
mal
ized
IO C
ost (
%)
Cache Size (GB)
(b) FIU−F2U(e) FIU–F2U
91.79%92.56% 92.59%89.82% 91.58% 97.58%
0 10 20 30 40 50 60 70 80 90
100 110 120
1 2 4 8 32
Nor
mal
ized
IO C
ost (
%)
Cache Size (GB)
UMASS
Global−LRUGlobal−CAR
vFRMGlobal−vFRM
Global–SPFRM–Auto
25.53%42.89%
53.65%60.23%
22.33%
(f) UMASS
Global–STFRM–AutovFRM GlobalLRU vFRM GlobalLRU vFRM Global−vFRM Global
0 10 20 30 40 50 60 70 80 90
100
1 2 4 8 12 16 24 32
Nor
mal
ized
IO C
ost (
%)
Cache Size (GB)
(a) MSR−F1
61.32%50.95%
58.00%
38.04% 38.81% 36.73% 36.49% 36.49%
0 10 20 30 40 50 60 70 80 90
100
1 2 4 8 12 16 24 32
Nor
mal
ized
IO C
ost (
%)
Cache Size (GB)
(b) MSR−U
12.69% 12.69% 12.60% 12.94% 13.25% 14.04% 15.18% 15.09%
0 10 20 30 40 50 60 70 80 90
100
2 4 8 16 24 32 48 64
Nor
mal
ized
IO C
ost (
%)
Cache Size (GB)
(c) MSR−F1U
15.52%14.76% 14.66% 14.32% 14.20% 15.63% 15.96% 15.59%
D_GREM
Nor
mal
ized
IO C
ost (
%)
Nor
mal
ized
IO C
ost (
%)
Nor
mal
ized
IO C
ost (
%)
Nor
mal
ized
IO C
ost (
%)
Nor
mal
ized
IO C
ost (
%)
Nor
mal
ized
IO C
ost (
%)
(a) MSR–F1
(b) MSR–U
(c) MSR–F1U
(e) FIU-F2U
(d) FIU–F1U
(f) UMASS
a-c: d-e:Global
D_GREM
GREM_EQvFRMGREM_EQLRU vFRM
vFRMLRUCAR GREM_EQ
50
55
60
65
70
4 8 10 12 16 20 24 32
Hit
Rat
io (%
)
Cache Size (GB)
(f) FIU−F1U
70
75
80
85
4 8 10 12 16 20 24 32
Hit
Rat
io (%
)
Cache Size (GB)
(b) FIU−F2U
40
50
60
70
80
90
100
4 8 10 12 16 20 24 32
Hit
Rat
io (%
)
Cache Size (GB)
(h) UMASS
Global−LRUGlobal−CARvFRM
25 30 35 40 45 50 55 60
2 4 8 16 24 32 48 64 80 96 128
Hit
Rat
io (%
)
Cache Size (GB)
(d) MSR−F1U
30 35 40 45 50 55 60 65
2 4 8 16 24 32 48 64 80 96 128
Hit
Rat
io (%
)
Cache Size (GB)
(e) MSR−F2U
65 70 75 80 85 90 95
100
1 2 4 8 12 16 24 32 40 48 64
Hit
Rat
io (%
)
Cache Size (GB)
(b) MSR−F1
75
80
85
90
95
100
1 2 4 8 12 16 24 32
Hit
Rat
io (%
)
Cache Size (GB)
(a) MSR−F1
5 10 15 20 25 30 35 40
1 2 4 8 12 16 24 32 40 48 64
Hit
Rat
io (%
)
Cache Size (GB)
(c) MSR−U
Hit
Rat
io (%
)H
it R
atio
(%)
Hit
Rat
io (%
)
Hit
Rat
io (%
)H
it R
atio
(%)
Hit
Rat
io (%
)
Hit
Rat
io (%
)H
it R
atio
(%)
LRUCAR D_GREM
GREM_EQ
vFRM
ZL ZS
SSDCurrEpoch
RAM
SSDNextEpoch
currEpochBin
Partition based on hit contribution ratio ρ
(a) Non-Bursty Case
(b) Bursty Case
SWLt SlidingWindow
Hit in ZL Hit in ZS
ZL ZS
SSDCurrEpoch
RAM
Hit in ZL
(descending sort)
Qualification Threshold
ZL ZS
SSDNextEpoch
SWQf
Aggressively allocate ZS based on SWQf
VM 1
Hypervisor Host
VM File System
RAM
I/O Controller
MappingTables
Storage Pool
SSD HDD
VM 3VM 2
Tier Manager
(a) Performance Isolation
src12 stg0 usr0 stg1 usr2 web2 src21
100
80
60
40
20
0
VM
Occ
upan
cy R
atio
(%)
(b) Fair Competition
100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)
100
80
60
40
20
0
VM
Occ
upan
cy R
atio
(%)
100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)
0
10
20
30
40
50
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
(a) ISO
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
(b) COMP
0
300
600
900
1200
1500
1800
2100
2400
2700
3000
10 20 30 40 50 60 70 80 90 100 110 120
Cac
he P
arti
tion
Siz
e (M
B)
Epoch (5min)
Partation
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
10 20 30 40 50 60 70 80 90 100 110 120
Dif
fRat
io (
%)
Epoch (5min)
Bursty Degree
Bursty Case: Feedback from Workload and
Cache
Non-Bursty Case:Feedback from Cache
Bursty Degree
10 20 30 40 50 60 70 80 90 100 110 120
1.0
0.8
0.6
0.4
0.2
0.0
Dif
f(W
SS
W)
Epoch (5min)
Partition3000
2400
1800
1200
600
0
Par
titi
on
Siz
e (M
B)
Epoch (5min)
Short Term Zone Long Term Zone
Threshold (Bth)=0.6
10 20 30 40 50 60 70 80 90 100 110 120
Epoch (5min)
0
300
600
900
1200
1500
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
Cac
he P
artit
ion
Size
(MB
)
Epoch (5min)
(a) MSR-F1
0
300
600
900
1200
1500
1800
2100
2400
2700
3000
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
Cac
he P
artit
ion
Size
(MB
)
Epoch (5min)
(c) MSR-F1U
0
300
600
900
1200
1500
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
Cac
he P
artit
ion
Size
(MB
)
Epoch (5min)
(b) MSR-U
Epoch (5min)
Epoch (5min)
(a) MSR-F1
Short Term Zone Long Term Zone
100 300 500 700 900 1100 1300 1500 1700 1900
3000
2400
1800
1200
600
0
Part
ition
Siz
e (M
B)
(c) MSR-F1U
Epoch (5min)
Epoch (5min)100 300 500 700 900 1100 1300 1500 1700 1900
Epoch (5min)
1500
1200
900
600
300
0
Part
ition
Siz
e (M
B)
(b) MSR-U
Epoch (5min)Epoch (5min)100 300 500 700 900 1100 1300 1500 1700 1900
Epoch (5min)
1500
1200
900
600
300
0
Part
ition
Siz
e (M
B)
There are two straightforward approaches: (1) Equally assigning SSDs toeach VM; and (2) Managing SSD resources in a fair competition mode.Unfortunately, they cannot fully utilize the benefits of SSD resources,particularly when the workloads frequently change and bursts or spikes ofIOs occur from time to time.
GREM takes dynamic IO demands of all VMs into consideration to splitthe entire SSD space into a long-term zone and a short-term zone andcost-effectively updates the content of SSDs in these two zones.
SSD
VM1
VM2VM1
VM2
VM1
VM1 VM2
SSD
VM1 VM2
Short-term hotbins compete in
its partition
Evict to HD
Extra Bins
SSD
Step 1
Top hot bins fairly compete
prevCachedBins
globalBinCount
Step 2
TopGlobalGins
globalBinCount
SSD Evict to HDD
prevPrvBins
topGlobalBins Overlapping with preCachedBins
(a) Fairly Competition
(b) Performance Isolation
Step 2
Step 1
1 Abstract
1 Background
1Design
(1) GREM can adaptively adjust the reservation for each VMinside the long-term zone based on their IO changes.(2) GREM can dynamically partition SSDs between long- andshort-term zones during runtime by leveraging the feedbacksfrom both cache performance and bursty workloads.
1 Methods
Hit RatioIO Update Cost
Dynamically PartitioningFig 1 Two approaches
Fig 2 Example of partition of each VM
Fig 3 Cache friendly and unfriendly traces
Fig 4 GREM architecture Fig 5 GREM partitions
Fig 6 Workflow of decision maker Fig 7 Tuning sizes of two zones
Fig 9 IO hit ratios of tracesFig 8 IO costs of traces
Fig 11 Partition insidelong term zone
Fig 10 Partition of two zones
Fig 12 Bursty vs zone sizes
[1] Jianzhe Tai, Deng Liu, Zhengyu Yang, Xiaoyun Zhu, Jack Lo, and Ningfang Mi, "ImprovingFlash Resource Utilization at Minimal Management Cost in Virtualized Flash-based StorageSystems", IEEE Transactions on Cloud Computing, 2015.[2] Zhengyu Yang, Jianzhe Tai, Ningfang Mi, "GREM: Dynamic SSD Resource Allocation inVirtualized Storage Systems with Heterogeneous VMs", 2016, Under Review.
Experimental results show that GREM can capture the cross-VM IO changes tomake correct decisions on resource allocation and thus obtain high IO hit ratioand low IO costs, compared with both traditional and recent caching algorithms.
1 References
GREM has a decision maker which detects bursty and non-bursty I/Os. It isbased on both workload and cache hit feedbacks. During bursty phase, GREMaggressively allocate more space of short-term zone based on workloadchange, otherwise, it conservatively allocate space for short-term zone basedon the cache hit status.
1Experimental Results
1Conclusions