self-adaptive provisioning of virtualized resources in ...jrao/papers/tc11.pdfper-use cloud...

1

Self-Adaptive Provisioning of VirtualizedResources in Cloud Computing

Jia Rao, Student Member, IEEE , Xiangping Bu, Student Member, IEEE ,Kun Wang, Student Member, IEEE and Cheng-Zhong Xu, Senior Member, IEEE

F

Abstract—Although cloud computing has gained sufficient popularityrecently, there are still some key impediments to enterprise adoption.Cloud management is one of the top challenges. The ability of on-the-flypartitioning hardware resources into virtual machine(VM) instances fa-cilitates elastic computing environment to users. But the extra layer of re-source virtualization poses challenges on effective cloud management.The factors of time-varying user demand, complicated interplay betweenco-hosted VMs and the arbitrary deployment of multi-tier applicationsmake it difficult for administrators to plan good VM configurations. Inthis paper, we propose a distributed learning mechanism that facilitatesself-adaptive virtual machines resource provisioning. We treat cloudresource allocation as a distributed learning task, in which each VMbeing a highly autonomous agent submits resource requests accordingto its own benefit. The mechanism evaluates the requests and replieswith feedbacks. We develop a reinforcement learning algorithm with ahighly efficient representation of experiences as the heart of the VMside learning engine. We prototype the mechanism and the distributedlearning algorithm in an iBalloon system. Experiment results on anXen-based cloud testbed demonstrate the effectiveness of iBalloon.The distributed VM agents are able to reach near-optimal configurationdecisions in 7 iteration steps at no more than 5% performance cost.Most importantly, iBalloon shows good scalability on resource allocationby scaling to 128 correlated VMs.

1 INTRODUCTION

One important offering of cloud computing is to delivercomputing Infrastructure-as-a-Service (IaaS). In this typeof cloud, raw hardware infrastructure, such as CPU,memory and storage, is provided to users as an on-demand virtual server. Aside from client-side reducedtotal cost of ownership due to a usage-based paymentscheme, a key benefit of IaaS for cloud providers isthe increased resource utilization in data centers. Dueto the high flexibility in adjusting virtual machine (VM)capacity, cloud providers can consolidate traditional webapplications into a fewer number of physical serversgiven the fact that the peak loads of individual appli-cations have few overlaps with each other [3].

In the case of IaaS, the performance of hosted appli-cations relies on effective management of VMs’ capacity.However, the additional layer of resource abstraction in-troduces unique requirements for the management. First,effective cloud management should be able to resizeindividual VMs in response to the change of application

• The authors are with the Department of Electrical and Computer Engi-neering, Wayne State University, 5050 Anthony Wayne Drive, Detroit,MI 48202. E-mail: {jrao,xpbu,kwang,czxu}@wayne.edu

demands. More importantly, besides the objective ofsatisfying Service Level Agreement (SLA) of individualapplications, system-wide resource utilization ought tobe optimized. In addition, real-time requirements of pay-per-use cloud computing for VM resource provisioningmake the problem even more challenging.

Although server virtualization helps realize perfor-mance isolation to some extent, in practice, VMs stillhave chances to interfere with each other. It is possiblethat one rogue application could adversely affect theothers [20], [10]. In [7], the authors showed that forVM CPU scheduling alone, it is already too complicatedto determine the optimal parameter settings. Takingmemory, I/O and network bandwidth into provisioningwill further complicate the problem. Time-varying appli-cation demands add one more dimension to the config-uration task. Dynamics in incoming traffic can possiblymake prior good VM configurations no longer suitableand result in significant performance degradation.

Furthermore, practical issues exist in fine-grained VMresource provisioning. By setting the management in-terval to 30 seconds, the authors in [23] observed thatunder sustained resource demands, a VM needs minutesto get its performance stabilize after memory reconfig-uration. Similar delayed effect can also be observed inCPU reconfiguration, partially due to the backlog ofrequests in prior intervals. The difficulty in evaluatingthe immediate output of management decisions makesthe modeling of application performance even harder.

Exporting infrastructure as a service gives cloud usersthe flexibility to select VM operating systems (OS) andthe hosted applications. But this poses new challengesto underlaying VM management as well. Because publicIaaS providers assume no knowledge of the hosted ap-plications, VM clusters of different users may overlap onphysical servers. The overall VM deployment can showan dependent topology with respect to resources onphysical hosts. The bottleneck of multi-tier applicationscan shift between tiers either due to workload dynamicsor mis-configurations on one tier. Mis-configured VMscan possibly become rogue ones affecting others. In theworst case, all nodes in the cloud may be correlated andmistakes in the capacity management of one VM mayspread onto the entire cloud.

2

Our previous work [23] demonstrates the efficacy ofreinforcement learning (RL)-based resource allocation ina static cloud environment that VMs are deployed onone physical machine. Based on state space definedon co-running VM configurations, we optimize system-wide VM performance on one machine under differentworkload combinations. Although effectively managingconfigurations of VMs with distinct resource demands,the approach in [23] assumes a static environment andrelies on workload specific environment models to mapVM configurations to system-wide performance index.This approach can not be easily extended to a dynamiccloud environment, in which VMs are hosted on a clusterof physical machines. First, it becomes prohibitivelyexpensive to maintain models for different workloadcombinations as the number of VMs increases. Second,possible VM join/leave or migration makes the opti-mization of cluster-wide performance difficult. Finally,the state space defined on VM configurations is notrobust to workload dynamics.

In this paper, we address the issues and present a dis-tributed learning approach for cloud management. Wedecompose the cluster-wide cloud management probleminto sub-problems concerning individual VM resourceallocations and consider cluster-wide performance to beoptimized if individual VMs meet their SLAs with a highresource utilization. To handle workload dynamics, weextend the state definition in [23] from VM configura-tions to VM running status and address the issues dueto the use of continuous running status as the state space.More specifically, our contributions are as follows:

(1) Distributed learning mechanism. We treat VMresource allocation as a distributed learning task. Insteadof cloud resource providers, cloud users manage indi-vidual VM capacity and submit resource requests basedon application demands. The host agent evaluates theaggregated requests on one machine and gives feedbackto individual VMs. Based on the feedbacks, each VMlearns its capacity management policy accordingly. Thedistributed approach is scalable because the complexityof the management is not affected by the number ofVMs and we rely on implicit coordination between VMsbelonging to the same virtual cluster.

(2) Self-adaptive capacity management We developan efficient reinforcement learning approach for the man-agement of individual VM capacity. The learning agentoperates on a VM’s running status which is definedon the utilization of multiple resources. We employ aCerebellar Model Articulation Controller-based Q tablefor continuous state representation. The resulted RLapproach is robust to workload changes because stateon low-level statistics accommodate workload dynamicsto a certain extent.

(3) Resource efficiency metric. We explicit optimizeresource efficiency by introducing a metric to measurea VM’s capacity settings. The metric synthesizes ap-plication performance and resource utilization. Whenemployed as feedbacks , it effectively punishes decisionsthat violate applications’ SLA and gives users incentives

to release unused resources.(4) Design and implementation of iBalloon. Our

prototype implementation of the distributed learningmechanism, namely iBalloon, demonstrated its effective-ness in a Xen-based cloud testbed. iBalloon was able tofind near optimal configurations for a total number of128 VMs on a 16-node closely correlated cluster withno more than 5% of performance overhead . We notethat, there were reports in literature about the automaticconfiguration of multiple VMs in a cluster of machines.This is the first work that scales the auto-configurationof VMs to a cluster of correlated nodes under work-conserving mode.

The rest of this paper is organized as follows. Sec-tion 2 discusses the challenges in cloud management.Section 3 and Section 4 elaborate the key designs andimplementation of iBalloon, respectively. Section 5 andSection 6 give experiments settings and results. Relatedwork is presented in Section 7. We conclude this paperand discuss future works in Section 8.

2 CHALLENGES IN CLOUD MANAGEMENT

In this section, we review the complications of CPU,memory and I/O resource allocations in cloud and dis-cuss the practical issues behind on-the-fly VM reconfig-uration and large scale VM management.

2.1 Complex Resource to Performance RelationshipIn cloud computing, application performance dependson the application’s ability to simultaneously access mul-tiple types of resources [21]. In this work, we considerCPU, memory and I/O bandwidth as the building blocksof a VM’s capacity. An accurate resource to performancemodel is crucial to the design of automatic capacity man-agement. However, the workload and cloud dynamicsmake the determination of the system model challeng-ing. Our discussions are based on Xen virtualizationplatforms, but they are applicable to other virtualizationplatforms like VMware and VirtualBox. In the Xen basedplatform, the driver domain (dom0) is a privileged VMresiding in the host OS. It manages other guest VMs(domU) and performs the resource allocations. In the restof this paper, we use dom0 and the host interchangeably.VMs always refer to the guest VMs or domUs.

2.1.1 CPUThe CPU(s) can be time-shared by multiple VMs in fine-grain. For example, the Credit Scheduler, which is thedefault CPU scheduler in Xen, can perform the CPUallocation in a granularity of 30 ms. On boot, eachresident VM is assigned a certain number of virtualCPU (VCPU), and the number can be changed on-the-fly. Although the number of VCPUs does not determinethe actual allocation of CPU cycles, it decides the max-imum concurrency and CPU time the VM can achieve.In general, CPU scheduling works in a work-conserving(WC) or non-work-conserving (NWC) mode.

3

It is convenient to obtain the VMs’ CPU utilization.The usage can be reported by dom0 using xentopor by the VM’s OS (e.g. the top command in linux).However, it is easily to determine how CPU resourcesare allocated to VMs. In general, there are three ways ofCPU allocation:

1) Under WC mode, set VMs’ VCPU to the numberof available physical CPU and change the CPUallocations by altering VMs priorities (or weight inXen).

2) Under WC mode, change CPU allocation by al-tering the VCPU number. It is equal to setting anupper limit of CPU allocation to the VCPU number.Within the limit, a VM can use CPU for free.

3) Under NWC mode, same as the first method, ex-cept that the allocations are specified as cap values.All the cap values add up to the total available CPUresource.

To determine the best CPU mode in cloud manage-ment, we compared the above three methods on a hostmachine with two quad-core Intel Xeon CPUs. Twoinstances of TPC-W database (DB) tier were consolidatedon the host. For more details about the TPC-W applica-tion, please refer to Section 5. The DB tier is primaryCPU-intensive and the VMs were limited to use the firstfour cores only. We make sure that the aggregated CPUdemand is beyond the total capacity of four cores.

Figure 1 draws the aggregated throughput and av-erage response time of two TPC-W instances, underdifferent CPU allocation modes. WC-4VCPU refers tothe first method with equal weight of the two VMs.Although the aggregated CPU demand is beyond fourcores, each VM actually needs a little more than twocores. It is equivalent to work-conserving with “over-provisioning” of CPU to individual VMs. WC-2VCPU issimilar except that there is a 2-VCPU upper limit for eachVM. In NWC-capped, we set the VMs to have 4 VCPUand each of the VM was capped to half of the CPU time.For example, in the case of four cores, a cap of 400 meansno limit while 200 refers to half of the capacity.

In the figure, we can see that WC-2VCPU providedthe best performance in terms of both throughput andresponse time. Plausible reasons for the compromisedperformance in the other two modes can be attributedto possible wasted CPU time. CPU contentions in WC-4VCPU may lower the CPU efficiency in serving re-quests. In principle, NWC-capped should deliver similarperformance as WC-2VCPU. In practice, the results dueto WC-2VCPU turned out to be better than those ofNWC-capped.

Under NWC mode, there is usually a simple (andoften linear) relationship between CPU resource andapplication performance. In [21], the authors showedan auto-regressive-moving-average model can representthis relationship well. However, in WC mode, the actualallocated CPU time to a VM is determined by the totalCPU demand on the host, which makes the modelingharder. We take the challenges to consider WC mode inthe VMs capacity management because it provides better

1400

1600

1800

2000

2200

2400

Throughput Response time 0

200

400

600

800

1000

Thr

ough

put (

req/

s)

Res

pons

e tim

e (m

s)

WC-4VCPUWC-2VCPU

NWC-capped

Fig. 1. Performance of TPC-W under different CPUallocation modes.

performance and avoids possible waste of CPU resource.

2.1.2 MemoryUnlike CPU, memory is usually shared by dividing thephysical address space into non-overlapping regions,each of which is used dedicatedly by one VM. Althoughit is possible for a VM to give up unused memorythrough self-ballooning [19], during each managementinterval we consider the allocated memory be usedexclusively by one VM. The objective of the cloud mem-ory management is to dynamically balancing “unused”memory from idle VMs to the busy ones. Identificationof “unused” memory pages or calculation of the memoryutilization of a running VM is not trivial. Different fromfree pages, “unused” pages refer to those that oncetouched but not actively being accessed by the system. Itcan be calculated as the total memory minus the systemworking set.

System working set size (WSS) can be estimated eitherby monitoring the disk I/O and major page faults [13], orusing miss ratio curve [33]. But these methods are onlysensitive to memory pressure and are able to increaseVM memory size accordingly. Any decrease of memoryusage can not be quickly detected. As a result, thememory of a VM may not be shrunk promptly.

In concept, the relationship between VM memory sizeand application-level performance is simple. That is, theperformance drops dramatically when the memory sizeis smaller than the application’s WSS. The open cloudenvironment adds one more uncertainty to VM memorymanagement. Modern OSes usually design their write-back policies based on system wide memory statistics.For example, in Linux, by default the write-back is trig-gered when 10% of the total memory is dirty. A changeof VM memory size may trigger background write-backsaffecting application performance considerably althoughthe new memory size is well above the WSS.

2.1.3 I/O BandwidthAll the I/O requests from VMs are serviced by thehost’s I/O system. If the host’s I/O scheduler is selectedproperly, e.g. the CFQ scheduler in Linux, VMs can havedifferentiated I/O services. Setting a VM to a higherpriority leads to higher I/O bandwidth or lower latency.The achieved I/O performance depends heavily on the

4

0 500

1000 1500 2000 2500 3000 3500 4000 4500

0 5 10 15 20 25 30 35 4010

100

1000

10000

Thr

ough

put (

req/

s)

Res

pons

e tim

e (m

s)

Time (mins)

ThroughputResponse time

Fig. 2. Delayed effect of VCPU reconfiguration.

sequentiality of the co-hosted I/O streams as well astheir request sizes. Thus, the I/O usage, e.g. the achievedI/O bandwidth reported by command like iostat, doesnot directly connect to application performance.

There are two key impediments in mapping the mem-ory or I/O resources to application performance. First,it is difficult to accurately measure the utilization ofthe resources. Second, the actual resource allocation (e.g.achieved I/O bandwidth) is determined by the charac-teristics of the applications as well as the co-runningVMs.

2.2 Issues of VM ReconfigurationVM capacity management relies on precise operationsthat set resources to desired values assuming the ob-servation of the instant reconfiguration effect. However,in fine-grained cloud management, such as in [21], [23],within the management interval the effect of a reconfig-uration can not be correctly perceived. The work in [23]showed up to 10 minutes delayed time before a memoryreconfiguration stabilizes. Similar phenomenon was alsoobserved in CPU.

We did tests measuring the dead time between achange in VCPU and the time the performance stabilizes.A single TPC-W DB tier was tested by changing itsVCPU. Figure 2 plots the application-level performanceover time. Starting from 4 VCPUs, the VM was removedone VCPU every 5 minutes until one was left at the timeof the 15th minute. Then the VCPU was added backone by one. At the 20th minute, the number of VCPUsincreased from 1 to 2. We observed a delay time of morethan 5 minutes before the response time stabilized at thetime of the 25th minute. The reason for the delay wasdue to the resource contention caused by the backloggedrequests when there were more CPU available. The VMtook a few minutes to digest the congested requests.

2.3 Cluster Wide CorrelationIn a public cloud, multi-tier applications spanning mul-tiple physical hosts require all tiers to be configured ap-propriately. In most multi-tier applications, request pro-cessing involves several stages at different tiers. Thesestages are usually synchronous in the sense that onestage is blocked until the completion of other stageson other tiers. Thus, the change of the capacity of onetier may affect the resource requirement on other tiers.In Table 1, we list the resource usage on the front-end

TABLE 1Configuration dependencies of multi-tier VMs.

DB VCPU 1VCPU 2VCPU 3VCPU 4VCPUAPP MEM 790MB 600MB 320MB 290MBAPP CPU% 61% 47% 15% 10%

Decision-maker

VM

Host-agent

App-agent

1

23

4

5

6

7

8

9

10

Fig. 3. The architecture and working flow of iBalloon. (1)The VM reports running status. (2) Decision-makerreplies with a capacity suggestion. (3) The VM submitsthe resource request. (4) Host-agent synchronouslycollects all VMs’ requests, reconfigures VM resources andsleeps for a management interval. (5)-(6) Host-agentqueries App-agent for VMs’ application-level perfor-mance. (7)-(8) Host-agent calculates and sends thefeedback. (9) The VM wraps the information aboutthis interaction and reports to Decision-maker. (10)Decision-maker updates the capacity managementpolicy for this VM accordingly.

application tier of TPC-W as the CPU capacity of theback-end tier changed. APP MEM refers to the minimummemory size that prevents the application server fromdoing significant swapping I/Os; APP CPU% denotesthe measured CPU utilization. The table suggests that,as the capacity of the back-end tier increases, the demandfor memory and CPU in the front tier decreases consider-ably. An explanation is that without prompt completionof requests at the back-end tier, the front tier needs tospend resources for unfinished requests. Therefore, anymistake in one VM’s capacity management may spreadto other hosts. In the worst case, all nodes in cloud couldbe correlated by multi-tier applications.

In summary, the aforementioned challenges in cloudcomputing brings unique requirements to VM capacitymanagement. (1) It should guarantee VM’s application-level performance in the presence of complicated re-source to performance relationships. (2) It should ap-propriately resize the VMs in response to time-varyingresource demands. (3) It should be able to work inan open cloud environment, without any assumptionsabout VM membership and deployment topology.

3 THE DESIGN OF IBALLOON

3.1 Overview

We design iBalloon as a distributed management frame-work, in which individual VMs initialize the capacitymanagement. iBalloon provides the hosted VMs with

5

capacity directions as well as evaluative feedbacks. Oncea VM is registered, iBalloon maintains its applicationprofile and history records that can be analyzed forfuture capacity management. For better portability andscalability, we decouple the functionality of iBallooninto three components: Host-agent, App-agent andDecision-maker.

Figure 3 illustrates the architecture of iBalloon aswell as its interactions with a VM. Host-agent,one per physical machine, is responsible for allocat-ing the host’s hardware resource to VMs and givesfeedback. App-agent maintains application SLA pro-files and reports run-time application performance.Decision-maker hosts a learning agent for each VMfor automatic capacity management. We make two as-sumptions on the self-adaptive VM capacity manage-ment. First, capacity decisions are made based on VMrunning status. Second, a VM relies on the feedbacksignals, which evaluates previous capacity managementdecisions, to revise the policy currently employed by itslearning agent.

The assumptions together define the VM capacitymanagement task as an autonomous learning process inan interactive environment. The framework is general inthe sense that various learning algorithms can be incor-porated. Although the efficacy or the efficiency of the ca-pacity management may be compromised, the complex-ity of the management task does not grow exponentiallywith the number of VMs or the number of resources.After a VM submits its SLA profile to App-agentand registers with Host-agent and Decision-maker,iBalloon works as illustrated in Figure 3. iBalloon consid-ers the VM capacity to be multidimensional, includingCPU, memory and I/O bandwidth. This is one of the ear-liest works that consider these three types of resourcestogether. A VM’s capacity can be changed by alteringthe VCPU number, memory size and I/O bandwidth.The management operation to one VM is defined as thecombination of three meta operations on each resource:increase, decrease and nop.

3.2 Key Designs3.2.1 VM Running StatusVM running status has a direct impact on managementdecisions. A running status should provide insights intothe resource usage of the VM, from which constrainedor over-provisioned resource can be inferred. We definethe VM running status as a vector of four tuples.

(ucpu, uio, umem, uswap),

where ucpu, uio, umem, uswap denote the utilization ofCPU, I/O, memory and disk swap, respectively. Asdiscussed above, memory utilization can not be triviallydetermined. We turn to guest OS reported metric tocalculate umem(See Section 4 for details). Since diskswapping activities are closely related to memory usage,adding uswap to the running status provides insights intomemory idleness and pressure.

3.2.2 Feedback SignalThe feedback signal ought to explicitly punish resourceallocations that lead to degraded application perfor-mance, and meanwhile encouraging a free-up of unusedcapacity. It also acts as an arbiter when resource are con-tented. We define a real-valued reward as the feedback.Whenever there is a conflict in the aggregated resourcedemand, e.g. the available memory becomes less thanthe total requested memory, iBalloon set the reward to−1 (penalty) for the VMs that require an increase in theresource and a reward of 0 (neural) to other VMs. In thisway, some of the conflicted VMs may back-off leadingto contention relaxation. Note that, although conflictedVMs may give up previous requests, Decision-makerwill suggest a second best plan, which may be the bestsolution to the resource contention.

When there is no conflict on resources, the rewarddirectly reflects application performance and resourceefficiency. We define the reward as a ratio of yield tocost:

reward =yield

cost,

where yield = Y (x1, x2, . . . , xm) =

∑m

i=1y(xi)

m ,

y(xi) =

1 if xi satisfies its SLA;

e−p∗(|

xi−x′i

x′i

|)− 1 otherwise,

and cost = 1 +

∑n

i=1(1−uk

i )1k

n . Note that the metricyield is a summarized gain over m performance metricsx1, x2, · · · , xm. The utility function y(xi) decays whenmetric xi violates its performance objective x

′

i in SLA.cost is calculated as the summarized utility based on nutilization status u1, u2, · · · , un. Both the utility functionsdecay under the control of the decay factors of p andk, respectively. We consider throughput and responsetime as the performance metrics and ucpu, uio, umem,uswap as the utilization metrics. The reward punishesany capacity setting that violates the SLA and givesincentives to high resource efficiency.

3.2.3 Self-adaptive Learning EngineAt the heart of iBalloon is a self-adaptive learning agentresponsible for each VM’s capacity management. Re-inforcement learning is concerned with how an agentought to take actions in a dynamic environment so as tomaximize a long term reward [27]. It fits naturally withiniBalloon’s feedback driven, interactive framework. RLoffers opportunities for highly autonomous and adaptivecapacity management in cloud dynamics. It assumes nopriori knowledge about the VM’s running environment.It is able to capture the delayed effect of reconfigurationsto a large extent.

A RL problem is usually modeled as a Markov DecisionProcess (MDP). Formally, for a set of environment statesS and a set of actions A, the MDP is defined by the tran-sition probability Pa(s, s

′) = Pr(st+1 = s′|st = s, at = a)and an immediate reward function R = E[rt+1|st =

6

),,,( swapmemiocpu uuuu

∑),( asQ

Memory table

Fig. 4. CMAC-based Q table.

s, at = a, st+1 = s′]. At each step t, the agent perceivesits current state st ∈ S and the available action set A(st).By taking action at ∈ A(st), the agent transits to the nextstate st+1 and receives an immediate reward rt+1 fromthe environment. The value function of taking action ain state s can be defined as:

Q(s, a) = E{∞∑k=0

γkrt+k+1|st = s, at = a},

where 0 ≤ γ < 1 is a discount factor helping Q(s, a)’sconvergence. The optimal policy is as simple as: alwaysselect the action a that maximizes the value functionQ(s, a) at state s. Finding the optimal policy is equivalentto obtain an estimation of Q(s, a) which approximates itsactual value. The estimate of Q(s, a) can be updated eachtime an interaction (st, at, rt+1) is finished:

Q(st, at) = Q(st, at)+α∗[rt+1+γ∗Q(st+1, at+1)−Q(st, at)],

where α is the learning rate. The interactions consist ofexploitations and explorations. Exploitation is to followthe policy obtained so far; in contrast, exploration is theselection of random actions to capture the change ofenvironment so as to refine the existing policy. We followthe ε-greedy policy to design the RL agent. With a smallprobability ε, the agent picks up a random action, andfollows the best policy it has found for the rest of thetime.

In VM capacity management, the state s correspondsto the VM’s running status and action a is the man-agement operation. For example, the action a can showin the form of (nop, increase, decrease), which indicatesan increase in the VM’s memory size and a decrease inI/O bandwidth. Actions in continuous space remains anopen research problem in the RL field, we limit the RLagent to discrete actions. The actions are discretized bysetting steps on each resource instead. VCPU is incre-mented or decremented by one at a time and memoryis reconfigured in a step of 256MB. I/O bandwidth ischanged by a step of 0.5MB.

The requirement of autonomy in VM capacity man-agement poses two key questions on the design of theRL engine. First, how to overcome the scalability andadaptability problems in RL? Second, how would themultiple RL agents, each of which represents a VM,

coordinate and optimize system-wide performance? Weanswer the questions by designing the VM capacity man-agement agent as a distributed RL agent with a highlyefficient representation of the Q table. Unlike, multi-agent RL, in which each agent needs to maintain othercompeting agents’ information, distributed RL does nothave explicit coordination scheme. Instead, it relies onthe feedback signals for coordination. For example, whenresources are contented, negative feedbacks help resolvethe contention. VMs belonging to the same applicationreceive the same feedback, which coordinates resourceallocations in the virtual cluster. An immediate benefit ofdistributed learning is that the complexity of the learningproblem does not grow exponentially with the numberof VMs.

The VM running status is naturally defined in amulti-dimensional continuous space. Although we limitthe actions to be discrete operations, the state itselfcan render the Q value function intractable. Due toits critical impact on the learning performance, thereare many studies on the Q function representation [27],[28]. We carefully reviewed these works and decided toborrow the design in the Cerebellar Model ArticulationController (CMAC) [2] to represent the Q function. Itmaintains multiple coarse-grained Q tables or so-calledtiles, each of which is shifted by a random offset withrespect to each other. With CMAC, we can achieve higherresolution in the Q table with less cost. For example,if each status input (an element in the status vector) isdiscretized to five intervals (a resolution of 20%), 32 tileswill give a resolution less than 1% (20%/32). The totalsize of the Q tables is reduced to 32∗54 compared to thesize of 1004 if plain look-up table is used. In CMAC, theactual Q table is stored in a one-dimensional memorytable and each cell in the table stores a weight value.Figure 4 illustrates the architecture of a one-dimensionalCMAC. The VM running status listed in Figure 4 is onlyfor illustration purpose. The state needs to work with afour-dimensional CMAC. Given a state s, CMAC uses ahash function, which takes a pair of state and action asinput, to generate indexes for the (s, a) pair. CMAC usesthe indexes to access the memory cells and calculatesQ(s, a) as the sum of the weights in these cells.

One advantage of CMAC is its efficiency in handlinglimited data. Similar VM states will generate CMACindexes with a large overlap. Thus, updates to onestate can generalize to the others, leading to acceleratedRL learning process. One update of the CMAC-basedQ table only needs 6.5 milliseconds in our testbed,in comparison with the 50-second update time in amulti-layer neural network [23]. Once a VM finishesan iteration, it submits the four-tuple (st, at, st+1, rt) toDecision-maker. Then the corresponding RL agentupdates the VM’s Q table using Algorithm 1. In thealgorithm, we further enhanced the CMAC-based Qtable with fast adaptation when SLA violated. We setthe learning rate α to 1 whenever receives a negativepenalty. This ensures that “bad” news travels faster thangood news allowing the learning agent quickly response

7

Algorithm 1 Update the CMAC-based Q value function1: Input st, at, st+1, rt;2: Initialize δ = 0;3: I[at][0] = get index(st);4: Q(st, at) =

∑j≤num tilings

j=1Q[I[at][j]];

5: at+1 = get next action(st+1);6: I[at+1][0] = get index(st+1);7: Q(st+1, at+1) =

∑j≤num tilings

j=1Q[I[at+1][j]];

8: δ = rt −Q(st, at + γ ∗Q(st+1, at+1));9: for i = 0; i < num tilings; i++ do

10: /*If SLA violated, enable fast adaptation*/11: if rt < 0 then12: θ[I[at][i]]+ = (1.0/num tilings) ∗ δ;13: else14: θ[I[at][i]]+ = (α/num tilings) ∗ δ;15: end if16: end for

to the performance violation.

4 IMPLEMENTATIONiBalloon has been implemented as a set of user-leveldaemons in guest and host OSes. The communication be-tween the host and guest VMs is carried out through aninter-domain channel. In our Xen-based testbed, we usedXenstore for the host and guest information exchange.Xenstore is a centralized configuration database that isaccessible by all domains on the same host. The domainswho are involved in the communication place ”watches”on a group of pre-defined keys in the database. When-ever sender initializes a communication by writing tothe key, the receiver is notified and possibly trigginga callback function. Upon a new VM joining a host,Host-agent, one per machine, creates a new key underthe VM’s path in Xenstore. Host-agent launches aworker thread for the VM and the worker ”watches” anychange of the key. Whenever a VM submits a resourcerequest via the key, the worker thread retrieves therequest details and activates the corresponding handlerin dom0 to handle the request. The VM receives thefeedback from Host-agent in a similar way.

We implemented resource allocation in dom0 in asynchronous way. VMs send out resource requests ina fixed interval (30 second in our experiments) andHost-agent waits for all the VMs before satisfying anyrequest. It is often desirable to allow users to submitrequests with different management intervals for flexi-bility and reliability in resource allocation. We leave theextension of iBalloon to asynchronous resource alloca-tion in the future work. After VMs and Host-agentagree on the resource allocations, Host-agent modifiesindividual VMs’ configurations accordingly. We changedthe memory size of the VM by writing the new size to thedomain’s memory/target key in Xenstore. VCPU num-ber was altered by turning on/off individual CPUs viakey cpu/CPUID/availability. For I/O bandwidthcontrol, we used command lsof to correlate VMs’virtual disks to processes and change the correspondingprocesses’ bandwidth allocation via the Linux device-mapper driver dm-ioband [30].

App-agent, one per host, maintains the hosted appli-cation SLA profiles. In our experiments, it periodicallyqueries participant machines through standard socketcommunication and reports application performance,such as throughput and response time, to Host-agent.In a more practical scenario, the application performanceshould be reported by a third-party application moni-toring tool instead of the clients. iBalloon can be easilymodified to integrate such tools.

We also consider two possible implementations ofDecision-maker.

1) Centralized decision maker. In this approach, adesignated server maintains all the Q learningtables of the VMs. Although centralized in mainte-nance of the learning trace, VMs’ capacity manage-ment decisions are independent of each other. Theadvantages include the simplicity of management:learning algorithms can be modified or reusedacross a group of VMs; avoidance of learning over-head: the possible overhead incurred by the learn-ing is removed from individual VMs. However,the centralized server can become a single pointof failure as well as performance bottleneck as thenumber of VMs increases. We use asynchronoussocket and multi-threads to improve concurrencyin the server.

2) Distributed decision agent. In this approach,learning is local to individual VMs andDecision-maker is a process residing inthe guest OS. The scalability of iBalloon is notlimited by the processing power in the centralizeddecision server, but at a cost of CPU and memoryoverhead in each VM.

Quantitative comparison of the two approaches will bepresented in Section 6.5.

We use xentop utility to report VM CPU utiliza-tion. xentop is instrumented to redirect the utilizationof each VM to separate log files in the tmpfs folder/dev/shm every second. A small utility program parsesthe logs and calculates the average CPU utilization forevery management interval. The disk I/O utilization iscalculated as a ratio of achieved bandwidth to allocatedbandwidth. The achieved the bandwidth can be obtainedby monitoring the disk activities in /proc/PID/io. PIDis the process number of a VM’s virtual disk in dom0.The swap rate can also be collected in a similar way.We consider memory utilization to be the guest OSmetric Active over memory size. The Active metricin /proc/meminfo is a coarse estimate of actively usedmemory size. However, it is lazily updated by guest ker-nel especially during memory idle periods. We combinethe guest reported metric and swap rate for a betterestimate of memory usage. With explorations from thelearning engine, iBalloon has a better chance to reclaimidle memory without causing significant swapping.

8

5 EXPERIMENT DESIGN

5.1 Methodology

To evaluate the efficacy of iBalloon, we attempt to an-swer the following questions: (1) How well does iBalloonperform in the case of single VM capacity management?Can the learned policy be re-used to control a similarapplication or on a different platform? (Section 6.3) (2)When there is resource contention, can iBalloon properlydistribute the constrained resource and optimize overallsystem performance? (Section 6.4) (3) How is iBalloon’sscalability and overhead? (Section 6.5) We selected threerepresentative server workloads as the hosted applica-tions. TPC-W [32] is an E-Commerce benchmark thatmodels after an online book store, which is primaryCPU-intensive. It consists of two tiers, i.e. the front-end application (APP) tier and the back-end database(DB) tier. SPECweb [31] is a web server benchmark suitethat delivers dynamic web contents. It is a CPU andnetwork-intensive server application. TPC-C [32] is anonline transaction processing benchmark that containslightweight disk reads and sporadic heavy writes. Itsperformance is sensitive to memory size and I/O band-width.

To create dynamic variations in resource demand,we instrumented the workload generators of TPC-Wand TPC-C to change client traffic level at run-time.The workload generator reads the traffic level from atrace file, which models after the real Internet trafficpattern [29]. We scaled down the Internet traces to matchthe capacity of our platform.

5.2 Testbed Configurations

Two clusters of nodes were used for the experiments.The first cluster (CIC100) is a shared research environ-ment, which consists of a total number of 22 DELL andSUN machines. Each machine in CIC100 is equippedwith 8 CPU cores and 8GB memory. The CPU and mem-ory configurations limit the number of VMs that can beconsolidated on each machine. Thus, we use CIC100 asa resource constrained cloud testbed to verify iBalloon’seffectiveness for small scale capacity management. OnceiBalloon gains enough experiences to make decisions, weapplied the learned policies to manage a large number ofVMs. CIC200 is a cluster of 16 DELL machines dedicatedto the cloud management project. Each node featuresa configuration of 12 CPU cores (with hyperthreadingenabled) and 32 GB memory. In the scale-out testing, wedeployed 64 TPC-W instances, i.e. a total number of 128VMs on CIC200. To generate sufficient client traffic tothese VMs, all the nodes in CIC100 were used to runclient generators, with 3 copies running on each node.

We used Xen version 4.0 as our virtualization en-vironment. dom0 and guest VMs were running Linuxkernel 2.6.32 and 2.6.18, respectively. To enable on-the-fly reconfiguration of CPU and memory, all the VMswere para-virtualized. The VM disk images were storedlocally on a second hard drive on each host. We created

the dm-ioband device mapper on the partition contain-ing the images to control the disk bandwidth. For thebenchmark applications, MySQL, Tomcat and Apachewere used for database, application and web servers.

6 EXPERIMENTAL RESULTS

6.1 Evaluation of the Reward MetricThe reward metric synthesizes multi-dimensional ap-plication performance and resource utilizations. We areinterested in how the reward signal guides the capacitymanagement. The decay rates p and k reflect how impor-tant it is for an application to meet the performance ob-jectives in its SLA and how aggressive the user increaseresource utilization even at the risk of overload. Figure 5plots the application yield with different decay rate p.The reward data was drawn from a 2-hour test run ofTPC-W with limited resources. During the experiment,there were considerable SLA violations. The x (responsetime) and y (throughput) axes show the difference (inpercentage) between the achieved performance and theobjective when SLA is violated. The larger the difference,the more deviation from the target. From Figure 5, wecan see that a larger decay rate imposes stricter re-quirements on meeting SLA. A small deviation from thetarget will effectively generate a large penalty. Similarly,k controls how the cost changes with resource usage. Alarger value of k encourages the user to use the resourcemore aggressively. Finally, We decided to guarantee usersatisfaction and assume risk neutral users, and set p = 10and k = 1.

Figure 6 shows how the reward reflect the status ofVM capacity. In this experiment, we varied the clienttraffic to occasionally exceed the VM’s capacity. rewardis calculated from the DB tier of a TPC-W instance, witha fixed configuration of 3 VCPU, 2GB memory and 2MB/s disk bandwidth. As shown in Figure 6, when theload is light, performance objectives are met. During thisperiod, yield is set to 1 and cost dominates the changeof reward. As traffic increases, resource utilization goesup incurring smaller cost. Similarly, reward drops whentraffic goes down because of the increase of the cost fac-tor. In contrast, when the VM becomes overloaded withSLA violations, the factor of yield dominates reward byimposing a large penalty. In conclusion, reward effec-tively punishes performance violations and gives usersincentives to release unused resources.

6.2 Exploitations vs. ExplorationsReinforcement learning is a direct adaptive optimal con-trol approach which relies on the interactions with theenvironment. Therefore, the performance of the learningalgorithm depends critically on how the interactionsare defined. Explorations are often considered as sub-optimal actions that lead to degraded performance.However, without enough explorations, the RL agenttends to be trapped in local optimal policies, failing toadapt to the change of the environment. On the other

9

0 0.5

1 1.5

2 0 0.2

0.4 0.6

0.8 1

-1

-0.8

-0.6

-0.4

-0.2

0

Yield

p=10

p=2

Response time

Throughput

Yield

Fig. 5. Application yield with different decay rates.

-0.20

0.20.40.60.8

1

0 25 50 75 100 125 150 175 200

Rew

ard

Time interval (30s)

Reward 100

200

300

400

500

600

Num

ber

of c

lient

s Traffic levelCapacity

Fig. 6. Reward with different traffic levels.

hand, too much exploration would certainly result inunacceptable application performance. Before iBalloon isactually deployed, we need to determine the value ofexploration rate, that best fits our platform.

In this experiment, we dedicated a physical host toone application and initialized the VM’s Q table to allzeros. We varied the exploration rate of the learningalgorithm and draw the application performance of TPC-W in Figure 7. The bars represent the average of 5 one-hour runs with the same exploration rate and variations.From the figure, we can see that the response time ofTPC-W is convex with respect to the exploration ratewith ε = 0.1 being the optimal. The same explorationrate also gives the best throughput as well as the smallestvariations. Experiments with TPC-C suggested a similarexploration rate. We also empirically determined thelearning rate and discount factor. For the rest of thispaper, we set the RL parameters to the following values:ε = 0.1, α = 0.1, γ = 0.9.

6.3 Single Application Capacity ManagementIn its simplest form, iBalloon manages a single VM orapplication’s capacity. In this subsection, we study itseffectiveness in managing different types of applicationswith distinct resource demands. The RL-based auto-configuration can suffer from initial poor performancedue to explorations with the environment. To have abetter understanding of the performance of RL-basedcapacity management, we tested two variations of iBal-loon, one with an initialization of the management policy

0

1000

2000

3000

4000

5000

6000

0 0.01 0.1 0.2 0.5 1.0 0

500

1000

1500

2000

2500

3000

Res

pons

e tim

e (m

s)

Thr

ough

put (

req/

s)

Exploration rate

Response timeThroughput

Fig. 7. Application performance with different explorationrates.

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000 2500

Cum

ulat

ive

dist

ribut

ion

Response time (ms)

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000 2500

Cum

ulat

ive

dist

ribut

ion

Response time (ms)

Over-provisioningStatic

iBalloon w/ initiBalloon w/o init

(a) TPC-W

0

0.2

0.4

0.6

0.8

1

5 10 15 20 25

Cum

ulat

ive

dist

ribut

ion

Response time (sec)

0

0.2

0.4

0.6

0.8

1

5 10 15 20 25

Cum

ulat

ive

dist

ribut

ion

Response time (sec)

Over-provisioningStatic

iBalloon w/ initiBalloon w/o init

(b) TPC-C

Fig. 8. Response time under various reconfigurationstrategies.

and one without. We denote them as iBalloon w/ init andiBalloon w/o init, respectively. The initial policy was ob-tained by running the application workload for 10 hours,during which iBalloon interacted with the environmentwith only exploration actions.

Figure 8(a) and Figure 8(b) plot the performance ofiBalloon and its variations in a 5-hour run of the TPC-W and TPC-C workloads. Note that during each ex-periment, the host was dedicated to TPC-W or TPC-C, thus no resource contention existed. In this simplesetting, we can obtain the upper bound and lower boundof iBalloon’s performance. The upper bound is dueto resource over-provisioning, which allocates more thanenough resource for the applications. The lower boundperformance was derived from a VM template whosecapacity is not changed during the test. We refer it asstatic. We configured the VM template with 1 VCPUand 512 MB memory in the experiment. If not otherwisespecified, we used the same template for all VM defaultconfiguration in the remaining of this paper.

From Figure 8(a), we can see that, iBalloon achievedclose performance compared with over-provisioning. iBal-loon w/o init managed to keep almost 90% of the requestbelow the SLA response time threshold except that a fewpercent of requests had wild response times. It suggeststhat, although started with poor policies, iBalloon wasable to quickly adapt to good policies and maintainedthe performance at a stable level. We attribute the goodperformance to the highly efficient representation of theQ table. The CMAC-enhanced Q table was able to gener-alize to the continuous state space with a limited numberof interactions. Not surprisingly, static’s poor result againcalls for appropriate VM capacity management.

10

TABLE 2Performance improvement due to initial policy learned

from different applications and cloud platforms.

Throughput Response timeTrained in TPC-W

Tested in SPECweb 40% 80%Trained in CIC100Tested in CIC200 20% 30%

As shown in Figure 8(b), iBalloon w/ init showed al-most optimal performance for TPC-C workload too. Butwithout policy initialization, iBalloon can only preventaround 80% of the requests from SLA violations; morethan 15% requests would have response times largerthan 30 seconds. This barely acceptable performancestresses the importance of a good policy in more com-plicated environments. Unlike CPU, memory sometimesshows unpredictable impact on performance. The deadtime due to the factor of memory is much longer thanCPU (10 minutes compared to 5 minutes in our exper-iments). In this case, iBalloon needs a longer time toobtain a good policy. Fortunately, the derived policy,which is embedded in the Q table, can be possibly re-used to manage similar applications.

Table 2 lists the application improvement if the learnedmanagement policies are applied to a different appli-cation or to a different platform. The improvement iscalculated against the performance of iBalloon withoutan initial policy. SPECweb [31] is a web server bench-mark suite that contains representative web workloads.The E-Commerce workload in SPECweb is similar toTPC-W (CPU-intensive) except that its performance isalso sensitive to memory size. Results in Table 2 suggestthat the Q-table learned for TPC-W also worked forSPECweb. An examination of iBalloon’s log revealedthat the learned policy was able to successfully matchCPU allocation to incoming traffic. A policy learned oncluster CIC100 can also give more than 20% performanceimprovement to the same TPC-W application on clusterCIC200. Given the fact that the nodes in CIC100 andCIC200 have more than 30% difference on CPU speedand disk bandwidth, we conclude that iBalloon policiesare applicable to heterogeneous platforms across cloudsystems.

The reward signal provides strong incentives to giveup unnecessary resources. In Figure 9, we plot theconfiguration of VCPU, memory and I/O bandwidth ofTPC-W, SPECweb and TPC-C as client workload varied.Recall that we do not have an accurate estimation ofmemory utilization. We rely on the Active metric inmeminfo and the swap rate to infer memory idleness.The Apache web server used in SPECweb periodicallyfree unused httpd process thus memory usage infor-mation in meminfo is more accurate. As shown inFigure 9, with a 10-hour trained policy, iBalloon was ableto expand and shrink CPU and I/O bandwidth resourcesas workload varied. As for memory, iBalloon was able

to quickly respond to memory pressure; it can releasepart of the unused memory although not completely.The agreement in shapes of each resource verifies theaccuracy of the reward metric.

We note that the above results only show the per-formance of iBalloon statistically. In practice, serviceproviders concern more about user-perceived perfor-mance, because in production systems, mistakes madeby autonomous capacity management can be pro-hibitively expensive. To test iBalloon’s ability of de-termining the appropriate capacity online, we ran theworkload generators at full speed and reduced the VM’scapacity every 15 management intervals. Figure 6.3 plotsthe client-perceived results in TPC-W and TPC-C. Inboth experiments, iBalloon was configured with initialpolicies. Each point in the figures represents the aver-age of a 30-second management interval. As shown inFigure 10(a), iBalloon is able to promptly detect the mis-configurations and reconfigure the VM to appropriatecapacity. On average, throughput and response time canbe recovered within 7 management intervals. Similarresults can also be observed in Figure 10(b) except thatclient-perceived response times have larger fluctuationsin TPC-C workload.

6.4 Coordination in Multiple ApplicationsiBalloon is designed as a distributed management frame-work that handles multiple applications simultaneously.The VMs rely on the feedback signals to form theircapacity management policy. Different from the case ofa single application, in which the feedback signal onlydepends on the resource allocated to the hosting VM, inmultiple application hosting, the feedback signals alsoreflect possible performance interferences between VMs.

We designed experiments to study iBalloon’s perfor-mance in coordinating multiple applications. Same asabove, iBalloon was configured to manage only the DBtiers of TPC-W workload. All the DB VMs were homoge-neously hosted in one physical host while the APP VMswere over-provisioned on another node. The baselineVM capacity strategy is to statically assign 4VCPU and1GB memory to all the DB VMs, which is consideredto be over-provisioning for one VM. iBalloon starts witha VM template, which has 1VCPU and 512MB memory.Figure 11 draws the performance of iBalloon normalizedto the baseline capacity strategy in a 5-hour test. Theworkload to each TPC-W instances varied dynamically,but the aggregated resource demand is beyond the ca-pacity of the machine that hosts the DB VMs. Figure 11shows that, as the number of the DB VMs increases,iBalloon gradually beats the baseline in both throughputand response time.

An examination of the iBalloon logs revealed that iBal-loon suggested a smaller number of VCPUs for the DBVMs, which possibly alleviated the contention for CPU.The baseline strategy encouraged resource contentionand resulted in wasted work. In summary, iBalloon,driven by the feedback, successfully coordinated com-peting VMs to use the resource more efficiently.

11

11.5

22.5

0 10 20 30 40 50 60 70

I/O b

andw

idth

(MB

/s)

Time interval (30s)

TPC-CTraffic level

Resource level

0.51

1.52

Mem

ory

size

(G

B)

SPECWeb

1234V

CP

U

TPC-W

Fig. 9. Resources allocations changingwith workload.

1000

1500

2000

2500

3000

3500

15 30 45 60 75

Thr

ough

put (

req/

s)

TPC-WReference

10

100

1000

10000

15 30 45 60 75Res

pons

e tim

e (m

s)

Time intervals (30 sec)

TPC-WSLA

(a) TPC-W

0 200 400 600 800

1000 1200

0 15 30 45 60 75 90 105

Thr

ough

put (

req/

s)

TPC-CReference

0.01

0.1

1

10

100

1000

0 15 30 45 60 75 90 105Res

pons

e tim

e (s

ec)

Time intervals (30 sec)

TPC-CSLA

(b) TPC-C

Fig. 10. User-perceived performance under iBalloon.

6.5 Scalability and Overhead AnalysisWe scaled iBalloon out to the large dedicated CIC200cluster and deployed 64 TPC-W instances, each withtwo tiers, on the cluster. We randomly deployed the 128VM on the 16 nodes, assuming no topology informa-tion. To avoid possible hotspot and load unbalancing,each node hosted 8 VMs, 4 APP and 4 DB tiers. Weimplemented Decision-maker as distributed decisionagents. The deployment is challenging to autonomouscapacity management for two reasons. First, iBalloonought to coordinate VMs on different hosts, each ofwhich runs its own resource allocation policy. The de-pendent relationships makes it harder to orchestrate allthe VMs. Second, consolidating APP (network-intensive)tiers with DB (CPU-intensive) tiers onto the same hostposes challenges in finding the balanced configuration.

Figure 12 plots the average performance of 64 TPC-W instances for a 10-hour test. In addition to iBalloon,we also experimented with four other strategies. Theoptimal strategy was obtained by tweaking the clustermanually. It turned out that the setting: DB VM with3VCPU,1GB memory and APP VM with 1VCPU, 1GBmemory delivered the best performance. work-conservingscheme is similar to the baseline in last subsection; it setsall VMs with fixed 4VCPU and 1GB memory. Adaptiveproportional integral (PI) method [22] directly tracks theerror of the measured response time and the SLO andadjusts resource allocations to minimize the error. Auto-regressive-moving-average (ARMA) method [21] builds alocal linear relationship between allocated resources andresponse time with recently collected samples, fromwhich the resource reconfiguration is calculated. Theperformance is normalized to optimal. For throughput,higher is better; for response time, lower is better.

From the figure, iBalloon achieved close throughputas optimal while incurred 20% degradation on requestlatency. This is understandable because any change ina VM’s capacity, especially memory reconfigurations,bring in unstable periods. iBalloon outperformed work-conserving scheme by more than 20% in throughput.Although work-conserving had compromised throughput,it achieved similar response time as optimal because it did

not perform any reconfigurations. adaptive-PI and ARMAachieved similar throughput as iBalloon but with morethan 100% degradations on response time. These controlmethods which are based either on system identificationor local linearization can suffer poor performance underboth workload and cloud dynamics. We conclude thatiBalloon scales to 128 VMs on a correlated cluster withnear-optimal application performance. In the next, weperform tests to narrow down the overhead incurred onrequest latency.

In previous experiments, iBalloon incurred non-negligible cost in response time. The cost was due tothe real overhead of iBalloon as well as the perfor-mance degradation caused by the reconfiguration. Tostudy the overhead incurred by iBalloon, we repeatedthe experiment as in Section 12 except that iBalloonoperated on the VMs with optimal configurations andreconfigurations were disabled in Host-agent. In thissetting, the overhead only comes from the interactionsbetween VMs and iBalloon. Figure 13 shows the over-head of iBalloon with two different implementations ofDecision-maker, namely the centralized and the dis-tributed implementations. In the centralized approach,a designated server performs RL algorithms for all theVMs. Again, the overhead is normalized to the perfor-mance in the optimal scheme.

Figure 13 suggests that the centralized decision serverbecomes the bottleneck with as much as 50% overheadon request latency and 20% on throughput as the numberof VMs increases. In contrast, the distributed approach,which computes capacity decisions on local VMs, in-curred less than 5% overhead on both response timeand throughput. To further confirm the limiting factorof centralized decision server, we split the centralizeddecision work onto two separate machines(denoted asHierarchical) in the case of 128 VMs. As shown inFigure 13, the overhead on request latency reduces bymore than a half. Additional experiments revealed thatcomputing the capacity management decisions locally inVMs requires no more than 3% CPU resources for Qcomputation and approximately 18MB of memory forQ table storage. The resource overhead is insignificant

12

0

0.5

1

1.5

2

2 3 4 5 6

Nor

mal

ized

per

form

ance

Number of VM instances

Response timeThroughput

Fig. 11. Performance of multiple appli-cations due to iBalloon.

0.5

1

1.5

2

2.5

Throughput Response time

Nor

mal

ized

per

form

ance Optimal

iBalloonWork-conserving

Adaptive-PIARMA

Fig. 12. Performance due to variousreconfiguration approaches on a clus-ter of 128 correlated VMs.

0

10

20

30

40

50

Res

pons

e T

ime

(%)

CentralizedDistributed

Hierarchical

0

5

10

15

2 4 6 8 10 12 14 16 18 20 22 24 48 128

Thr

ough

put (

%)

Number of VMs

CentralizedDistributed

Fig. 13. Runtime overhead of iBalloon.

compared to the capacity of the VM template (1VCPU,512MB). These results conclude that iBalloon adds nomore than 5% overhead to the application performancewith a manageable resource cost.

7 RELATED WORK

Cloud computing allows cost-efficient server consoli-dation to increase system utilization and reduce cost.Resource management of virtualized servers is an im-portant and challenging task, especially when dealingwith fluctuating workloads and performance interfer-ence. Traditional control theory and machine learninghave been applied with success to the resource allocationin physical servers; see [1], [18], [25], [15], [16], [24], [8]for examples. Recent work demonstrated the feasibilityof these methods to automatic virtualized resource allo-cation.

Early work [22], [26] focused on the tuning of theCPU resource only. Padala, et al. employed a propor-tional controller to allocate CPU shares to VM-basedmulti-tier applications [22]. This approach assumes non-work-conserving CPU mode and no interference be-tween co-hosted VMs, which can lead to resource under-provisioning. Recent work [14] enhanced traditional con-trol theory with Kalman filters for stability and adapt-ability. But the work remains under the assumption ofCPU allocation. The authors in [26] applied domainknowledge guided regression analysis for CPU alloca-tion in database servers. The method is hardly applicableto other applications in which domain knowledge is notavailable.

The allocation of memory is more challenging. Thework in [11] dynamically controlled the VM’s memoryallocation based on memory utilization. Their approachis application specific, in which the Apache web serveroptimizes its memory usage by freeing unused httpdprocesses. For other applications like MySQL database,the program tends to cache data aggressively. The calcu-lation of the memory utilization for VMs hosting theseapplications is much more difficult. Xen employs Self-Ballooning [19] to do dynamic memory allocation. Itestimates the VM’s memory requirement based on OS-reported metric: Commited_AS. It is effective expanding

a VM under memory pressures, but not being able toshrink the memory appropriately. More accurate esti-mation of the actively used memory (i.e. the workingset size) can be obtained by either monitoring the diskI/O [13] or tracking the memory miss curves [33]. How-ever, these event-driven updates of memory informationcan not promptly shrink the memory size during mem-ory idleness. Although, we have not found a good wayto estimate the VM’s working size, iBalloon relies on acombination of performance metrics to decide memoryallocation. With more information on VMs’ business andpast trial-and-error experiences, iBalloon achieves moreaccurate memory allocation.

Automatic allocation of multiple resources [21] or formultiple objectives [17], [9] poses challenges in the de-sign of the management scheme. Complicated relation-ship between resource and performance makes the mod-eling of underlying systems hard. Padala, et al. appliedan auto-regressive-moving-average (ARMA) model withsuccess to represent the allocation to application perfor-mance relationship. They used a MIMO controller toautomatically allocate CPU share and I/O bandwidthto multiple VMs. However, the ARMA model may notbe effective under workload with large variations. Itsperformance can also be affected by VM inferences.

Different from the above approaches in designing aself-managed system, RL realizes autonomic manage-ment by performing trial-and-error interactions with en-vironments and can possibly be applied to dynamic andcomplex problems. Recently, RL has been successfullyapplied to automatic application parameter tuning [6],[4], optimal server allocation [28] and self-optimizingmemory controller design [12]. Autonomous resourcemanagement in cloud systems introduces unique re-quirements and challenges in RL-based automation, dueto dynamic resource demand, changing VM deploymentand frequent VM interference. More importantly, user-perceived quality of service should also be guaranteed.The RL-based methods should be scalable and highlyadaptive. The authors in [23] attempted to apply RLin host-wide VM resource management. They addressedthe scalability and adaptability issues using model-basedRL that builds environment models in order to accel-erate the learning process. However, the complexity of

13

maintaining the models for the systems under differ-ent scenarios becomes prohibitively expensive when thenumber of VMs increases. Bu et al proposed to use sys-tem knowledge-guided trimming of the RL state space incoordinated configuration of virtual resources and appli-cation parameters [5]. Although it effectively reduces theinitial searching space in RL problems, it is still unable todeal with resource allocations in multiple machines andthe state reduction will be less effective as the number ofVMs increases. In contrast, we design the VM resourceallocation as a distributed task working directly on thesystem-level metrics. In a distributed learning process,iBalloon demonstrated a scalability up to 128 correlatedVMs on 16 nodes under work-conserving mode.

8 CONCLUSION

In this work, we present iBalloon, a generic frameworkthat allows self-adaptive virtual machine resource pro-visioning. The heart of iBalloon is the distributed rein-forcement learning agents that coordinate in dynamicenvironment. Our prototype implementation of iBalloon,which uses a highly efficient reinforcement learningalgorithm as the learning, was able to find the nearoptimal configurations for a total number of 128 VMson a closely correlated cluster with no more than 5%overhead on application throughput and response time.

Nevertheless, there are several limitations of this work.First, the management operations are discrete and arein a relatively coarse granularity. Second, the RL-basedcapacity management still suffers from initial perfor-mance considerably. Future work can extend iBalloon bycombining control theory with reinforcement learning. Itoffers opportunities for the control theory to provide finegrained operations and stable initial performance.

ACKNOWLEDGMENT

We would like to thank the anonymous reviewers fortheir constructive comments. This work was supportedin part by U.S. NSF grants CNS-0702488, CRI-0708232,CNS-0914330, and CCF-1016966.

REFERENCES

[1] T. F. Abdelzaher, K. G. Shin, and N. Bhatti. Performance guaran-tees for web server end-systems: A control-theoretical approach.IEEE Trans. Parallel Distrib. Syst., 13, January 2002.

[2] J. S. Albus. A New Approach to Manipulator Control: theCerebellar Model Articulation Controller (CMAC). Journal ofDynamic Systems, Measurement, and Control, 1975.

[3] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz,A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica,and M. Zaharia. Above the clouds: A berkeley view of cloudcomputing. Technical report, EECS Department, University ofCalifornia, Berkeley, Feb 2009.

[4] X. Bu, J. Rao, and C.-Z. Xu. A reinforcement learning approachto online web systems auto-configuration. In ICDCS, 2009.

[5] X. Bu, J. Rao, and C.-Z. Xu. A model-free learning approach forcoordinated configuration of virtual machines and appliances. InMASCOTS, 2011.

[6] H. Chen, G. Jiang, H. Zhang, and K. Yoshihira. Boosting the per-formance of computing systems through adaptive configurationtuning. In SAC, 2009.

[7] L. Cherkasova, D. Gupta, and A. Vahdat. When virtual is harderthan real: Resource allocation challenges in virtual machine basedit environments. Technical report, HP Labs, Feb 2007.

[8] J. Gong and C.-Z. Xu. A gray-box feedback control approach forsystem-level peak power management. In ICPP, 2010.

[9] J. Gong and C.-Z. Xu. vpnp: Automated coordination of powerand performance in virtualized datacenters. In IWQoS, 2010.

[10] D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat. Enforcingperformance isolation across virtual machines in xen. In Middle-ware, 2006.

[11] J. Heo, X. Zhu, P. Padala, and Z. Wang. Memory overbookingand dynamic control of xen virtual machines in consolidatedenvironments. In IM, 2009.

[12] E. Ipek, O. Mutlu, J. F. Martinez, and R. Caruana. Self-optimizingmemory controllers: A reinforcement learning approach. In ISCA,2008.

[13] S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau.Geiger: monitoring the buffer cache in a virtual machine envi-ronment. In ASPLOS, 2006.

[14] E. Kalyvianaki, T. Charalambous, and S. Hand. Self-adaptive andself-configured cpu resource provisioning for virtualized serversusing kalman filters. In ICAC, 2009.

[15] A. Kamra, V. Misra, and E. M. Nahum. Yaksha: a self-tuningcontroller for managing the performance of 3-tiered web sites. InIWQoS, 2004.

[16] M. Karlsson, C. T. Karamanolis, and X. Zhu. Triage: performanceisolation and differentiation for storage systems. In IWQoS, 2004.

[17] S. Kumar, V. Talwar, V. Kumar, P. Ranganathan, and K. Schwan.vmanage: loosely coupled platform and virtualization manage-ment in data centers. In ICAC, 2009.

[18] C. Lu, T. F. Abdelzaher, J. A. Stankovic, and S. H. Son. A feedbackcontrol approach for guaranteeing relative delays in web servers.In RTAS, 2001.

[19] D. Magenheimer. Memory overcommit...without the commit-ment. Technical report, Xen Summit, June 2009.

[20] D. Ongaro, A. L. Cox, and S. Rixner. Scheduling i/o in virtualmachine monitors. In VEE, 2008.

[21] P. Padala, K.-Y. Hou, K. G. Shin, X. Zhu, M. Uysal, Z. Wang,S. Singhal, and A. Merchant. Automated control of multiplevirtualized resources. In EuroSys, 2009.

[22] P. Padala, K. G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal,A. Merchant, and K. Salem. Adaptive control of virtualizedresources in utility computing environments. In EuroSys, 2007.

[23] J. Rao, X. Bu, C.-Z. Xu, L. Wang, and G. Yin. VCONF: a reinforce-ment learning approach to virtual machines auto-configuration.In ICAC, 2009.

[24] J. Rao and C.-Z. Xu. Online measurement the capacity of multi-tier websites using hardware performance counters. In ICDCS,2008.

[25] P. P. Renu, P. Pradhan, R. Tewari, S. Sahu, A. Ch, and P. Shenoy.An observation-based approach towards self-managing webservers. In IWQoS, 2002.

[26] A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis,and S. Kamath. Automatic virtual machine configuration fordatabase workloads. In SIGMOD Conference, 2008.

[27] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduc-tion. MIT Press, 1998.

[28] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani. On the use ofhybrid reinforcement learning for autonomic resource allocation.Cluster Computing, 2007.

[29] The ClarkNet Internet traffic trace.http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html.

[30] The dm-ioband bandwidth controller.http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband.

[31] The SPECweb benchmark. http://www.spec.org/web2005.[32] The Transaction Processing Council (TPC). http://www.tpc.org.[33] W. Zhao and Z. Wang. Dynamic memory balancing for virtual

machines. In VEE, 2009.

self-adaptive provisioning of virtualized resources in ...jrao/papers/tc11.pdfper-use cloud...

Documents