grem: dynamic ssd resource allocation in virtualized

1
Undergraduate/Graduate Category: Computer and Information Sciences Degree Level: PhD Candidate in ECE Abstract ID#: 1382 In a shared virtualized storage system that runs heterogeneous VMs with diverse IO demands, it becomes a problem for the hypervisor to cost-effectively partition and allocate SSD resources among multiple VMs. We design a Global SSD Resource Management solution - GREM, which aims to fully utilize SSD resources as “smart” cache under the consideration of performance isolation. GREM: Dynamic SSD Resource Allocation In Virtualized Storage Systems With Heterogeneous VMs Zhengyu Yang 1 , Jianzhe Tai 2 , Ningfang Mi 1 Northeastern University 1 VMware Inc 2 VM1 VM2 VM3 VM4 Short-term Zone Long-term Zone VM1 VM2 VM3 VM4 Short-term Zone Long-term Zone VM1 VM2 VM3 VM4 Short-term Zone Long-term Zone (a) GREM_EQ (b) GREM (c) D_GREM Dynamical allocation for each VM Performance Isolation Fair Competition Dynamical allocation for each VM Dynamical allocation for two zones VM1 VM2 VM3 VM4 Long-term Zone (ZL) Short-term Zone (ZS) Performance Isolation Fair Competition CZL CZS CT 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 35 69 103 137 171 205 239 273 307 341 375 409 443 477 511 545 579 613 647 681 715 749 783 817 851 885 919 953 987 1021 1055 1089 1123 1157 1191 1225 1259 1293 1327 1361 1395 1429 1463 1497 1531 1565 1599 1633 1667 1701 1735 1769 1803 1837 1871 1905 1939 1973 2007 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 32 63 94 125 156 187 218 249 280 311 342 373 404 435 466 497 528 559 590 621 652 683 714 745 776 807 838 869 900 931 962 993 1024 1055 1086 1117 1148 1179 1210 1241 1272 1303 1334 1365 1396 1427 1458 1489 1520 1551 1582 1613 1644 1675 1706 1737 1768 1799 1830 1861 1892 1923 1954 1985 (a) mds0 Working Set Size (MB) 100 250 400 550 700 850 1000 1150 1300 1450 1600 1750 1900 Epoch (5min) (b) src12 Working Set Size (MB) 100 250 400 550 700 850 1000 1150 1300 1450 1600 1750 1900 Epoch (5min) (a) MSR–F1 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 200 VM Allocation in Long Term Zone (%) Epoch (5min) 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 200 VM Allocation in Long Term Zone (%) Epoch (5min) mds0 src12 stg0 usr0 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 200 VM Allocation in Long Term Zone (%) Epoch (5min) mds0 src12 stg0 usr0 stg1 usr2 web2 src21 100 80 60 40 20 0 VM Occupancy Ratio in ZL (%) stg1 usr2 web2 src21 (b) MSR–U 100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min) 100 80 60 40 20 0 VM Occupancy Ratio in ZL (%) 100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min) (c) MSR–F1U 100 80 60 40 20 0 VM Occupancy Ratio in ZL (%) 100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min) Bursty Workloads Change Cache Hit Bursty Detector Aggressive Allocation Conservative Adjustment Operations Non-bursty Feedbacks Next Epoch D_GREM 0 10 20 30 40 50 60 70 80 90 100 110 120 4 8 10 12 16 32 Cache Size (GB) 86.38% 91.60% 92.08% 93.41% 94.47% 100.00% 0 10 20 30 40 50 60 70 80 90 100 110 120 4 8 10 12 16 32 Cache Size (GB) 91.79% 92.56% 92.59% 89.82% 91.58% 97.58% 0 10 20 30 40 50 60 70 80 90 100 110 120 1 2 4 8 32 Cache Size (GB) 25.53% 42.89% 53.65% 60.23% 22.33% 0 10 20 30 40 50 60 70 80 90 100 1 2 4 8 12 16 24 32 Normalized IO Cost (%) Cache Size (GB) 61.32% 50.95% 58.00% 38.04% 38.81% 36.73% 36.49% 36.49% 0 10 20 30 40 50 60 70 80 90 100 1 2 4 8 12 16 24 32 Normalized IO Cost (%) Cache Size (GB) 12.69% 12.69% 12.60% 12.94% 13.25% 14.04% 15.18% 15.09% 0 10 20 30 40 50 60 70 80 90 100 2 4 8 16 24 32 48 64 Normalized IO Cost (%) Cache Size (GB) 15.52% 14.76% 14.66% 14.32% 14.20% 15.63% 15.96% 15.59% D_GREM Normalized IO Cost (%) Normalized IO Cost (%) Normalized IO Cost (%) Normalized IO Cost (%) Normalized IO Cost (%) Normalized IO Cost (%) (a) MSR–F1 (b) MSR–U (c) MSR–F1U (e) FIU-F2U (d) FIU–F1U (f) UMASS a-c: d-e: D_GREM GREM_EQ vFRM GREM_EQ LRU vFRM vFRM LRU CAR GREM_EQ 50 55 60 65 70 4 8 10 12 16 20 24 32 Cache Size (GB) (f) FIUF1U 70 75 80 85 4 8 10 12 16 20 24 32 Hit Ratio (%) Cache Size (GB) (b) FIUF2U 40 50 60 70 80 90 100 4 8 10 12 16 20 24 32 Cache Size (GB) (h) UMASS 25 30 35 40 45 50 55 60 2 4 8 16 24 32 48 64 80 96 128 Hit Ratio (%) Cache Size (GB) (d) MSRF1U 30 35 40 45 50 55 60 65 2 4 8 16 24 32 48 64 80 96 128 Cache Size (GB) (e) MSRF2U 65 70 75 80 85 90 95 100 1 2 4 8 12 16 24 32 40 48 64 Cache Size (GB) (b) MSRF1 75 80 85 90 95 100 1 2 4 8 12 16 24 32 Hit Ratio (%) Cache Size (GB) (a) MSRF1 5 10 15 20 25 30 35 40 1 2 4 8 12 16 24 32 40 48 64 Cache Size (GB) (c) MSRU Hit Ratio (%) Hit Ratio (%) Hit Ratio (%) Hit Ratio (%) Hit Ratio (%) Hit Ratio (%) Hit Ratio (%) Hit Ratio (%) LRU CAR D_GREM GREM_EQ vFRM ZL ZS SSD CurrEpoch RAM SSD NextEpoch currEpochBin Partition based on hit contribution ratio ρ (a) Non-Bursty Case (b) Bursty Case SWLt SlidingWindow Hit in ZL Hit in ZS ZL ZS SSD CurrEpoch RAM Hit in ZL (descending sort) Qualification Threshold ZL ZS SSD NextEpoch SWQf Aggressively allocate ZS based on SWQf VM 1 Hypervisor Host VM File System RAM I/O Controller Mapping Tables Storage Pool SSD HDD VM 3 VM 2 Tier Manager (a) Performance Isolation src12 stg0 usr0 stg1 usr2 web2 src21 100 80 60 40 20 0 VM Occupancy Ratio (%) (b) Fair Competition 100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min) 100 80 60 40 20 0 VM Occupancy Ratio (%) 100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 300 600 900 1200 1500 1800 2100 2400 2700 3000 10 20 30 40 50 60 70 80 90 100 110 120 Cache Partition Size (MB) Bursty Case: Feedback from Workload and Cache Non-Bursty Case: Feedback from Cache Bursty Degree 10 20 30 40 50 60 70 80 90 100 110 120 1.0 0.8 0.6 0.4 0.2 0.0 Diff(WSSW) Epoch (5min) Partition 3000 2400 1800 1200 600 0 Partition Size (MB) Epoch (5min) Short Term Zone Long Term Zone Threshold (Bth)=0.6 10 20 30 40 50 60 70 80 90 100 110 120 Epoch (5min) 0 300 600 900 1200 1500 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 Cache Partition Size (MB) 0 300 600 900 1200 1500 1800 2100 2400 2700 3000 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 200 Cache Partition Size (MB) 0 300 600 900 1200 1500 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 Cache Partition Size (MB) Epoch (5min) (a) MSR-F1 Short Term Zone Long Term Zone 100 300 500 700 900 1100 1300 1500 1700 1900 3000 2400 1800 1200 600 0 Partition Size (MB) (c) MSR-F1U Epoch (5min) Epoch (5min) 100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min) 1500 1200 900 600 300 0 Partition Size (MB) (b) MSR-U Epoch (5min) 100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min) 1500 1200 900 600 300 0 Partition Size (MB) There are two straightforward approaches: (1) Equally assigning SSDs to each VM; and (2) Managing SSD resources in a fair competition mode. Unfortunately, they cannot fully utilize the benefits of SSD resources, particularly when the workloads frequently change and bursts or spikes of IOs occur from time to time. GREM takes dynamic IO demands of all VMs into consideration to split the entire SSD space into a long-term zone and a short-term zone and cost-effectively updates the content of SSDs in these two zones. SSD VM1 VM2 VM1 VM2 VM1 VM1 VM2 SSD VM1 VM2 Short-term hot bins compete in its partition Evict to HD Extra Bins SSD Step 1 Top hot bins fairly compete prevCachedBins globalBinCount Step 2 TopGlobalGins globalBinCount SSD Evict to HDD prevPrvBins topGlobalBins Overlapping with preCachedBins (a) Fairly Competition (b) Performance Isolation Step 2 Step 1 1 Abstract 1 Background 1 Design (1) GREM can adaptively adjust the reservation for each VM inside the long-term zone based on their IO changes. (2) GREM can dynamically partition SSDs between long- and short-term zones during runtime by leveraging the feedbacks from both cache performance and bursty workloads. 1 Methods Hit Ratio IO Update Cost Dynamically Partitioning Fig 1 Two approaches Fig 2 Example of partition of each VM Fig 3 Cache friendly and unfriendly traces Fig 4 GREM architecture Fig 5 GREM partitions Fig 6 Workflow of decision maker Fig 7 Tuning sizes of two zones Fig 9 IO hit ratios of traces Fig 8 IO costs of traces Fig 11 Partition inside long term zone Fig 10 Partition of two zones Fig 12 Bursty vs zone sizes [1] Jianzhe Tai, Deng Liu, Zhengyu Yang, Xiaoyun Zhu, Jack Lo, and Ningfang Mi, "Improving Flash Resource Utilization at Minimal Management Cost in Virtualized Flash-based Storage Systems", IEEE Transactions on Cloud Computing, 2015. [2] Zhengyu Yang, Jianzhe Tai, Ningfang Mi, "GREM: Dynamic SSD Resource Allocation in Virtualized Storage Systems with Heterogeneous VMs", 2016, Under Review. Experimental results show that GREM can capture the cross-VM IO changes to make correct decisions on resource allocation and thus obtain high IO hit ratio and low IO costs, compared with both traditional and recent caching algorithms. 1 References GREM has a decision maker which detects bursty and non-bursty I/Os. It is based on both workload and cache hit feedbacks. During bursty phase, GREM aggressively allocate more space of short-term zone based on workload change, otherwise, it conservatively allocate space for short-term zone based on the cache hit status. 1 Experimental Results 1 Conclusions

Upload: others

Post on 10-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Undergraduate/GraduateCategory:ComputerandInformationSciencesDegreeLevel:PhDCandidateinECEAbstractID#:1382

In a shared virtualized storage system that runs heterogeneous VMswith diverse IO demands, it becomes a problem for the hypervisorto cost-effectively partition and allocate SSD resources amongmultiple VMs. We design a Global SSD Resource Managementsolution - GREM, which aims to fully utilize SSD resources as“smart” cache under the consideration of performance isolation.

GREM:DynamicSSDResourceAllocationInVirtualizedStorageSystemsWithHeterogeneousVMsZhengyu Yang1,Jianzhe Tai2,Ningfang Mi1NortheasternUniversity1 VMwareInc2

VM1 VM2 VM3 VM4

Short-term Zone Long-term Zone

VM1 VM2 VM3 VM4

Short-term Zone Long-term Zone

VM1 VM2 VM3 VM4

Short-term Zone Long-term Zone

(a) GREM_EQ

(b) GREM

(c) D_GREM

Dynamical allocation for each VM

Performance Isolation Fair Competition

Dynamical allocation for each VM

Dynamical allocation for two zones

VM1 VM2 VM3 VM4

Long-term Zone (ZL) Short-term Zone (ZS)

Performance Isolation Fair Competition

CZL CZS

CT

0200400600800100012001400160018002000

1 35 69 103

137

171

205

239

273

307

341

375

409

443

477

511

545

579

613

647

681

715

749

783

817

851

885

919

953

987

1021

1055

1089

1123

1157

1191

1225

1259

1293

1327

1361

1395

1429

1463

1497

1531

1565

1599

1633

1667

1701

1735

1769

1803

1837

1871

1905

1939

1973

2007

0200400600800100012001400160018002000

1 32 63 94 125

156

187

218

249

280

311

342

373

404

435

466

497

528

559

590

621

652

683

714

745

776

807

838

869

900

931

962

993

1024

1055

1086

1117

1148

1179

1210

1241

1272

1303

1334

1365

1396

1427

1458

1489

1520

1551

1582

1613

1644

1675

1706

1737

1768

1799

1830

1861

1892

1923

1954

1985

(a) mds0

Wor

king

Set

Siz

e (M

B)

100 250 400 550 700 850 1000 1150 1300 1450 1600 1750 1900

Epoch (5min)

(b) src12

Wor

king

Set

Siz

e (M

B)

100 250 400 550 700 850 1000 1150 1300 1450 1600 1750 1900

Epoch (5min)

(a) MSR–F1

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000VM

Allo

catio

n in

Lon

g Te

rm Z

one

(%)

Epoch (5min)

(b) MSR-U

Epoch (5min)

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000VM

Allo

catio

n in

Lon

g Te

rm Z

one

(%)

Epoch (5min)

(a) MSR-F1

Epoch (5min)

mds0 src12 stg0 usr0

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000VM

Allo

catio

n in

Lon

g Te

rm Z

one

(%)

Epoch (5min)

(c) MSR-F1U

Epoch (5min)

mds0 src12 stg0 usr0 stg1 usr2 web2 src21

100

80

60

40

20

0

VM

Occ

upan

cy R

atio

in Z

L (%

)

stg1 usr2 web2 src21

(b) MSR–U

100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)

100

80

60

40

20

0

VM

Occ

upan

cy R

atio

in Z

L (%

)

100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)

(c) MSR–F1U

100

80

60

40

20

0

VM

Occ

upan

cy R

atio

in Z

L (%

)

100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)

Bursty

Workloads Change Cache Hit

Bursty Detector

AggressiveAllocation

ConservativeAdjustmentOperations

Non-bursty

Feedbacks

Next Epoch

Global−LRU vFRM GlobalLRU vFRM Global−vFRM GlobalD_GREMvFRM Global

0 10 20 30 40 50 60 70 80 90

100 110 120

4 8 10 12 16 32

Nor

mal

ized

IO C

ost (

%)

Cache Size (GB)

(a) F1U(d) FIU–F1U

86.38% 91.60% 92.08% 93.41% 94.47% 100.00%

0 10 20 30 40 50 60 70 80 90

100 110 120

4 8 10 12 16 32

Nor

mal

ized

IO C

ost (

%)

Cache Size (GB)

(b) FIU−F2U(e) FIU–F2U

91.79%92.56% 92.59%89.82% 91.58% 97.58%

0 10 20 30 40 50 60 70 80 90

100 110 120

1 2 4 8 32

Nor

mal

ized

IO C

ost (

%)

Cache Size (GB)

UMASS

Global−LRUGlobal−CAR

vFRMGlobal−vFRM

Global–SPFRM–Auto

25.53%42.89%

53.65%60.23%

22.33%

(f) UMASS

Global–STFRM–AutovFRM GlobalLRU vFRM GlobalLRU vFRM Global−vFRM Global

0 10 20 30 40 50 60 70 80 90

100

1 2 4 8 12 16 24 32

Nor

mal

ized

IO C

ost (

%)

Cache Size (GB)

(a) MSR−F1

61.32%50.95%

58.00%

38.04% 38.81% 36.73% 36.49% 36.49%

0 10 20 30 40 50 60 70 80 90

100

1 2 4 8 12 16 24 32

Nor

mal

ized

IO C

ost (

%)

Cache Size (GB)

(b) MSR−U

12.69% 12.69% 12.60% 12.94% 13.25% 14.04% 15.18% 15.09%

0 10 20 30 40 50 60 70 80 90

100

2 4 8 16 24 32 48 64

Nor

mal

ized

IO C

ost (

%)

Cache Size (GB)

(c) MSR−F1U

15.52%14.76% 14.66% 14.32% 14.20% 15.63% 15.96% 15.59%

D_GREM

Nor

mal

ized

IO C

ost (

%)

Nor

mal

ized

IO C

ost (

%)

Nor

mal

ized

IO C

ost (

%)

Nor

mal

ized

IO C

ost (

%)

Nor

mal

ized

IO C

ost (

%)

Nor

mal

ized

IO C

ost (

%)

(a) MSR–F1

(b) MSR–U

(c) MSR–F1U

(e) FIU-F2U

(d) FIU–F1U

(f) UMASS

a-c: d-e:Global

D_GREM

GREM_EQvFRMGREM_EQLRU vFRM

vFRMLRUCAR GREM_EQ

50

55

60

65

70

4 8 10 12 16 20 24 32

Hit

Rat

io (%

)

Cache Size (GB)

(f) FIU−F1U

70

75

80

85

4 8 10 12 16 20 24 32

Hit

Rat

io (%

)

Cache Size (GB)

(b) FIU−F2U

40

50

60

70

80

90

100

4 8 10 12 16 20 24 32

Hit

Rat

io (%

)

Cache Size (GB)

(h) UMASS

Global−LRUGlobal−CARvFRM

25 30 35 40 45 50 55 60

2 4 8 16 24 32 48 64 80 96 128

Hit

Rat

io (%

)

Cache Size (GB)

(d) MSR−F1U

30 35 40 45 50 55 60 65

2 4 8 16 24 32 48 64 80 96 128

Hit

Rat

io (%

)

Cache Size (GB)

(e) MSR−F2U

65 70 75 80 85 90 95

100

1 2 4 8 12 16 24 32 40 48 64

Hit

Rat

io (%

)

Cache Size (GB)

(b) MSR−F1

75

80

85

90

95

100

1 2 4 8 12 16 24 32

Hit

Rat

io (%

)

Cache Size (GB)

(a) MSR−F1

5 10 15 20 25 30 35 40

1 2 4 8 12 16 24 32 40 48 64

Hit

Rat

io (%

)

Cache Size (GB)

(c) MSR−U

Hit

Rat

io (%

)H

it R

atio

(%)

Hit

Rat

io (%

)

Hit

Rat

io (%

)H

it R

atio

(%)

Hit

Rat

io (%

)

Hit

Rat

io (%

)H

it R

atio

(%)

LRUCAR D_GREM

GREM_EQ

vFRM

ZL ZS

SSDCurrEpoch

RAM

SSDNextEpoch

currEpochBin

Partition based on hit contribution ratio ρ

(a) Non-Bursty Case

(b) Bursty Case

SWLt SlidingWindow

Hit in ZL Hit in ZS

ZL ZS

SSDCurrEpoch

RAM

Hit in ZL

(descending sort)

Qualification Threshold

ZL ZS

SSDNextEpoch

SWQf

Aggressively allocate ZS based on SWQf

VM 1

Hypervisor Host

VM File System

RAM

I/O Controller

MappingTables

Storage Pool

SSD HDD

VM 3VM 2

Tier Manager

(a) Performance Isolation

src12 stg0 usr0 stg1 usr2 web2 src21

100

80

60

40

20

0

VM

Occ

upan

cy R

atio

(%)

(b) Fair Competition

100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)

100

80

60

40

20

0

VM

Occ

upan

cy R

atio

(%)

100 300 500 700 900 1100 1300 1500 1700 1900 Epoch (5min)

0

10

20

30

40

50

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

(a) ISO

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

(b) COMP

0

300

600

900

1200

1500

1800

2100

2400

2700

3000

10 20 30 40 50 60 70 80 90 100 110 120

Cac

he P

arti

tion

Siz

e (M

B)

Epoch (5min)

Partation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

10 20 30 40 50 60 70 80 90 100 110 120

Dif

fRat

io (

%)

Epoch (5min)

Bursty Degree

Bursty Case: Feedback from Workload and

Cache

Non-Bursty Case:Feedback from Cache

Bursty Degree

10 20 30 40 50 60 70 80 90 100 110 120

1.0

0.8

0.6

0.4

0.2

0.0

Dif

f(W

SS

W)

Epoch (5min)

Partition3000

2400

1800

1200

600

0

Par

titi

on

Siz

e (M

B)

Epoch (5min)

Short Term Zone Long Term Zone

Threshold (Bth)=0.6

10 20 30 40 50 60 70 80 90 100 110 120

Epoch (5min)

0

300

600

900

1200

1500

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

Cac

he P

artit

ion

Size

(MB

)

Epoch (5min)

(a) MSR-F1

0

300

600

900

1200

1500

1800

2100

2400

2700

3000

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

Cac

he P

artit

ion

Size

(MB

)

Epoch (5min)

(c) MSR-F1U

0

300

600

900

1200

1500

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

Cac

he P

artit

ion

Size

(MB

)

Epoch (5min)

(b) MSR-U

Epoch (5min)

Epoch (5min)

(a) MSR-F1

Short Term Zone Long Term Zone

100 300 500 700 900 1100 1300 1500 1700 1900

3000

2400

1800

1200

600

0

Part

ition

Siz

e (M

B)

(c) MSR-F1U

Epoch (5min)

Epoch (5min)100 300 500 700 900 1100 1300 1500 1700 1900

Epoch (5min)

1500

1200

900

600

300

0

Part

ition

Siz

e (M

B)

(b) MSR-U

Epoch (5min)Epoch (5min)100 300 500 700 900 1100 1300 1500 1700 1900

Epoch (5min)

1500

1200

900

600

300

0

Part

ition

Siz

e (M

B)

There are two straightforward approaches: (1) Equally assigning SSDs toeach VM; and (2) Managing SSD resources in a fair competition mode.Unfortunately, they cannot fully utilize the benefits of SSD resources,particularly when the workloads frequently change and bursts or spikes ofIOs occur from time to time.

GREM takes dynamic IO demands of all VMs into consideration to splitthe entire SSD space into a long-term zone and a short-term zone andcost-effectively updates the content of SSDs in these two zones.

SSD

VM1

VM2VM1

VM2

VM1

VM1 VM2

SSD

VM1 VM2

Short-term hotbins compete in

its partition

Evict to HD

Extra Bins

SSD

Step 1

Top hot bins fairly compete

prevCachedBins

globalBinCount

Step 2

TopGlobalGins

globalBinCount

SSD Evict to HDD

prevPrvBins

topGlobalBins Overlapping with preCachedBins

(a) Fairly Competition

(b) Performance Isolation

Step 2

Step 1

1 Abstract

1 Background

1Design

(1) GREM can adaptively adjust the reservation for each VMinside the long-term zone based on their IO changes.(2) GREM can dynamically partition SSDs between long- andshort-term zones during runtime by leveraging the feedbacksfrom both cache performance and bursty workloads.

1 Methods

Hit RatioIO Update Cost

Dynamically PartitioningFig 1 Two approaches

Fig 2 Example of partition of each VM

Fig 3 Cache friendly and unfriendly traces

Fig 4 GREM architecture Fig 5 GREM partitions

Fig 6 Workflow of decision maker Fig 7 Tuning sizes of two zones

Fig 9 IO hit ratios of tracesFig 8 IO costs of traces

Fig 11 Partition insidelong term zone

Fig 10 Partition of two zones

Fig 12 Bursty vs zone sizes

[1] Jianzhe Tai, Deng Liu, Zhengyu Yang, Xiaoyun Zhu, Jack Lo, and Ningfang Mi, "ImprovingFlash Resource Utilization at Minimal Management Cost in Virtualized Flash-based StorageSystems", IEEE Transactions on Cloud Computing, 2015.[2] Zhengyu Yang, Jianzhe Tai, Ningfang Mi, "GREM: Dynamic SSD Resource Allocation inVirtualized Storage Systems with Heterogeneous VMs", 2016, Under Review.

Experimental results show that GREM can capture the cross-VM IO changes tomake correct decisions on resource allocation and thus obtain high IO hit ratioand low IO costs, compared with both traditional and recent caching algorithms.

1 References

GREM has a decision maker which detects bursty and non-bursty I/Os. It isbased on both workload and cache hit feedbacks. During bursty phase, GREMaggressively allocate more space of short-term zone based on workloadchange, otherwise, it conservatively allocate space for short-term zone basedon the cache hit status.

1Experimental Results

1Conclusions