stratos: a network-aware orchestration layer for

14
arXiv:1305.0209v1 [cs.NI] 1 May 2013 Stratos: A Network-Aware Orchestration Layer for Middleboxes in the Cloud Aaron Gember , Anand Krishnamurthy , Saul St. John , Robert Grandl , Xiaoyang Gao , Ashok Anand , Theophilus Benson , Aditya Akella , Vyas Sekar University of Wisconsin – Madison, Bell Labs India, Princeton University, Stony Brook University ABSTRACT We see an increasing demand for in-the-cloud middlebox processing as applications and enterprises want their cloud deployments to leverage the same benefits that such services offer in traditional deployments. Unfortunately, today’s cloud middlebox deployments lack the same abstractions for flex- ible deployment and elastic scaling that have been instru- mental to the adoption and success of cloud-based compute and storage services. The key challenge here is that such net- work processing workloads are fundamentally different from traditional virtualized compute and storage services. These differences arise as a consequence of the ways in which ten- ants need to compose different middlebox services and the network-level factors (e.g., placement, load balancing) that impact application performance. To address this challenge, we present the design and im- plementation of Stratos. Stratos allows tenants to specify logical middlebox deployments and provides efficient scal- ing, placement, and distribution algorithms that abstract away low-level issues in ensuring effective application performance. We demonstrate the effectiveness of Stratos using an experi- mental prototype, a limited deployment over EC2, and large- scale simulations. 1. INTRODUCTION Surveys show that enterprises rely heavily on in-network middleboxes such as load balancers, intrusion prevention sys- tems, and WAN optimizers to ensure application security and improve performance [35, 33]. As many of these ap- plications and services move to the cloud, enterprises would naturally like to leverage the same performance and secu- rity benefits in the cloud. This is evidenced by an increas- ing number of commercial middlebox vendors providing vir- tual appliances [3, 12, 13], research prototypes and startups proposing in-the-cloud network processing services [35, 22, 9], and the emergence of similar (albeit limited) offerings from cloud providers themselves [2]. The ability to elastically scale deployments to match de- mand and to flexibly manage virtual compute and storage resources has been a driving factor contributing to the adop- tion of cloud deployments. Unfortunately, cloud customers today lack similar support and abstractions for their in-the- cloud virtual middlebox (MB) deployments. Existing ab- stractions treat MBs the same as any other compute nodes, leading to brittleness, inflexibility and poor elasticity (§2). MB deployments are different from traditional virtualized compute or storage resources in three respects: Composition: MBs are rarely used in isolation. Deploy- ments are typically structured as physical or logical chains where a given flow/packet is processed by a sequence of het- erogeneous MBs that lie on critical forwarding paths. The MB processing required may change depending on observed traffic patterns. Thus, (i) there needs to be intrinsic support for static and dynamic MB composition, and (ii) more im- portantly, management functions must consider chain-level performance. Network-aware scaling: Since MBs are on the data path, their network footprint more critically impacts application performance compared to virtual compute services. In par- ticular, the contention between MB and other network traffic can vary dynamically in time and space for complex chains. Coupled with MB heterogeneity, variable MB performance in virtualized environments, and heavy resource multiplex- ing in clouds, this necessitates a new approach to identify bottlenecks and make informed horizontal scaling decisions. As we show, traditional “network-agnostic” scaling approaches based on monitoring CPU/memory do not work even for simple MB chains. Fine-tuning network interactions: Network bottlenecks can hurt MB performance and hence tenant applications. Yet, the presence of multiple MBs in a chain provides many useful knobs for minimizing the potential for contention between MB sourced/destined traffic and other traffic. Tuning these knobs is crucial because it helps optimally leverage the pro- cessing capacity of MB instances. Aside from extracting more out of MBs, this helps: (i) improve the effectiveness of the scaling decisions, and (ii) support a greater number of elastic tenant MB chains at the same or lower cost. In essence, providing the management flexibility and hor- izontal scalability for MB deployments similar to compute and storage services requires designing new cloud network functions that explicitly manage the network configuration and interactions of MBs. Thus, we design and implement, Stratos, a new network-aware orchestration layer for MBs. Stratos’s configuration plane allows a cloud tenant to flex- ibly compose and dynamically alter virtual topologies that contain arbitrary MB chains (§3.1). The configuration plane 1

Upload: others

Post on 08-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stratos: A Network-Aware Orchestration Layer for

arX

iv1

305

0209

v1 [

csN

I] 1

May

201

3

Stratos A Network-Aware Orchestration Layer forMiddleboxes in the Cloud

Aaron GemberDagger Anand KrishnamurthyDagger Saul St JohnDagger Robert GrandlDagger Xiaoyang GaoDaggerAshok Ananddagger Theophilus Bensonlowast Aditya AkellaDagger Vyas Sekar

DaggerUniversity of Wisconsin ndash MadisondaggerBell Labs IndialowastPrinceton UniversityStony Brook University

ABSTRACTWe see an increasing demand for in-the-cloud middleboxprocessing as applications and enterprises want their clouddeployments to leverage the same benefits that such servicesoffer in traditional deployments Unfortunately todayrsquoscloudmiddlebox deployments lack the same abstractions for flex-ible deployment and elastic scaling that have been instru-mental to the adoption and success of cloud-based computeand storage services The key challenge here is that such net-work processing workloads are fundamentally different fromtraditional virtualized compute and storage services Thesedifferences arise as a consequence of the ways in which ten-ants need to compose different middlebox services and thenetwork-level factors (eg placement load balancing) thatimpact application performance

To address this challenge we present the design and im-plementation of Stratos Stratos allows tenants to specifylogical middlebox deployments and provides efficient scal-ing placement and distribution algorithms that abstractawaylow-level issues in ensuring effective application performanceWe demonstrate the effectiveness of Stratos using an experi-mental prototype a limited deployment over EC2 and large-scale simulations

1 INTRODUCTIONSurveys show that enterprises rely heavily on in-network

middleboxes such as load balancers intrusion prevention sys-tems and WAN optimizers to ensure application securityand improve performance [35 33] As many of these ap-plications and services move to the cloud enterprises wouldnaturally like to leverage the same performance and secu-rity benefits in the cloud This is evidenced by an increas-ing number of commercial middlebox vendors providing vir-tual appliances [3 12 13] research prototypes and startupsproposing in-the-cloud network processing services [35 229] and the emergence of similar (albeit limited) offeringsfrom cloud providers themselves [2]

The ability to elastically scale deployments to match de-mand and to flexibly manage virtual compute and storageresources has been a driving factor contributing to the adop-tion of cloud deployments Unfortunately cloud customerstoday lack similar support and abstractions for their in-the-cloud virtual middlebox(MB) deployments Existing ab-

stractions treat MBs the same as any other compute nodesleading to brittleness inflexibility and poor elasticity (sect2)

MB deployments are different from traditional virtualizedcompute or storage resources in three respectsComposition MBs are rarely used in isolation Deploy-ments are typically structured as physical or logicalchainswhere a given flowpacket is processed by a sequence of het-erogeneous MBs that lie on critical forwarding paths TheMB processing required may change depending on observedtraffic patterns Thus (i) there needs to be intrinsic supportfor static and dynamic MB composition and (ii ) more im-portantly management functions must consider chain-levelperformanceNetwork-aware scaling Since MBs are on the data paththeir network footprint more critically impacts applicationperformance compared to virtual compute services In par-ticular the contention between MB and other network trafficcan vary dynamically in time and space for complex chainsCoupled with MB heterogeneity variable MB performancein virtualized environments and heavy resource multiplex-ing in clouds this necessitates a new approach to identifybottlenecks and make informed horizontal scaling decisionsAs we show traditional ldquonetwork-agnosticrdquo scaling approachesbased on monitoring CPUmemory do not work even forsimple MB chainsFine-tuning network interactions Network bottlenecks canhurt MB performance and hence tenant applications Yet thepresence of multiple MBs in a chain provides many usefulknobs for minimizing the potential for contention betweenMB sourceddestined traffic and other traffic Tuning theseknobs is crucial because it helps optimally leverage the pro-cessing capacity of MB instances Aside from extractingmore out of MBs this helps (i) improve the effectivenessof the scaling decisions and (ii ) support a greater number ofelastic tenant MB chains at the same or lower cost

In essence providing the management flexibility and hor-izontal scalability for MB deployments similar to computeand storage services requires designing new cloud networkfunctions that explicitly manage the network configurationand interactions of MBs Thus we design and implementStratos a new network-aware orchestration layer for MBs

Stratosrsquos configuration plane allows a cloud tenant to flex-ibly compose and dynamically alter virtual topologies thatcontain arbitrary MB chains (sect31) The configuration plane

1

exports an annotated logical topology view to tenants wherethe annotations are hints on MB network footprint Stratosrsquosmanagement plane implements efficient algorithms to mapthe logical view to an appropriate physical realization

Stratosrsquos management plane also automatically and accu-rately determines the bottleneck for a tenantrsquos deploymentusing anapplication-aware heuristicthat relies on applica-tion reported performance measures (sect4) The heuristic im-plicitly takes into account MBsrsquo holistic resource consump-tion including compute memory and the network

Finally Stratos explicitly manages the network interac-tions of MBs in order to maximize the network capacity be-tween them Specifically the management plane implementstwo functions that both take profiles of MB network foot-print and logical MB topologies as input (i) A placementalgorithm that logically partitions the physical MB topol-ogy into per-rack partitions and places them with minimalinter-partition communication (sect5) (ii ) A traffic distributionalgorithm to route traffic across the different MBsreplicasthat further reduces the network footprint (sect6) Placementis triggered when a new tenant arrives scaling decisions aremade or network-wide management actions occur (eg VMmigration) Traffic distribution is invoked periodically tore-balance traffic based on changing MB network footprintchanging network load from other tenants or a placementdecision

We implement Stratos as a collection of modules runningatop Floodlight [6] (asymp7500 LOC) These modules (i) parsetenant chain configuration files (ii ) gather performance met-rics from network switches applications and MBs usingSNMP (iii ) execute Stratosrsquos scaling placement and flowdistribution algorithms (iv) launch and terminate VMs us-ing Xen [15] and (v) install forwarding rules in hypervisor-resident Open vSwitches [8]

We conduct controlled experiments of our prototype overa 24 node72 VM data center testbed We also evaluate astripped down Stratos for EC2 which only implements ourscaling and load distribution heuristics Finally we conductsimulations to study Stratosrsquos impact at scale

Our central goal is to verify the importance of network-awareness embedded into Stratos be it in scaling placementor distribution in supporting MB services in the most effec-tive fashion To this end we find

bull Stratos helps optimally meet application demand byaccurately identifying bottlenecks and either adding theappropriate number of MB replicas or redistributingtraffic at coarse and fine timescales to overcome con-gestion

bull Network-agnostic approaches use up to 2X as manyMBs as Stratos yet they cannot meet application de-mand resulting in severely backlogged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2

Figure 1Example middlebox and server topology

Stratos imposes little setup overhead Stratosrsquos fine-grained load distribution plays a crucial role in sustain-ing application performance despite changing networkconditions

2 BACKGROUNDMBs play a key role in enterprises and private data cen-

ters [33] with application traffic often traversing multipleMB appliances With enterprises migrating their applica-tions to the cloud a wide-variety of cloud-provided services(eg Amazonrsquos Elastic Load Balancer [2]) and third-partyVM images [3 12 13] have emerged to supply the desiredMB functionality In fact recent surveys show that 87of IT professionals believe that network-level MB servicesshould be a key part of Cloud-based IaaS offerings [1]

In this section we describe typical approaches used todayto leverage MBs in the cloud and show that due to lack ofsuitable abstractions and intrinsic management functionalitythese approaches offer limited to no flexibility and impedeelastic scaling Our observations are derived on the basis ofour own experience in trying to deploy such network ser-vices in Amazon EC2 [2]

Composition In contrast to traditional compute applica-tions network services are frequently deployed as a ldquochainrdquoof several MBs [24] For example traffic may enter thedata center through a WAN optimizer or redundancy elim-ination (RE) MB be mirrored to an intrusion detection sys-tem (IDS) directed to a load balancer and assigned to oneof several application servers (Figure 1)

Since todayrsquos cloud providers are largely geared towardtraditional applications they provide little control over net-work topology and routing [2 10] and third-party overlayservices [14] only facilitate topologies containing directlyaddressed endpoints (in contrast MBs should frequently betransparent) As a result tenants are forced to run MBs asgeneric VMs and manually piece together tunnels trafficsplitters and other software to route the desired traffic Suchmanual and distributed configuration makes it hard to dy-namically add new functionality add replicas MBs to man-age load or route around failed MBs As an anecdote im-plementing the relatively simple set of MB traversals shownin Figure 1 required several days of trial-and-error to obtaina working setup in EC2 which relied on several third-partytools and configurations strewn across VMs

Automation scripts are insufficient since they make dy-namic changes possible but not easy Indeed the tenant stillhas to implement extra logic - eg to distribute appropri-ate traffic subsets to MB replicas - which may change whennew type of MBs are deployed in a chain (eg transcodingor compression engines which change expected load) More

2

Figure 2 Lack of scaling due to network bottlenecks

Figure 3 Ineffective scaling due to poor placement

importantly the implemented logic may be fundamentallyinsufficient due to lack of intrinsic support from the cloudprovider We highlight this next

Elastic Scaling Being on the critical forwarding pathMBsrsquo performance and network footprint can significantlyimpact end-to-end application performance Unfortunatelythere are no effective schemes today to identify bottlenecksin and elastically scale MB chains This is because existingapproaches [11 4] do not recognize the chain as an entityand there are no intrinsic mechanisms to help control MBchain performance

Given todayrsquos compute-centric view tenants could moni-tor basic resource consumption (CPU memory IO) to iden-tify if individual MBs are bottlenecks Unfortunately thismay not be sufficient because the bottleneck may be a net-work link on the path between two MBs in a chain Whilenetwork bottlenecks also impact regular cloud applicationsthese effects get magnified in the context of MBs becausethey lie on the critical forwarding path We illustrate thisinFigure 2 where the IPS and RE MBs run at 50 utilizationand hence no scaling is triggered Yet the applicationrsquos per-formance which is bottlenecked by the congested link canbe improved by adding an RE instance (outside rack-2) andsending some part of the traffic to it In general unless theperformance constraints imposed by all elements in a chain- MBs and network links alike - are taken in account bottle-necks cannot be identifiedovercome effectively

One of the key reasons that effective elastic MB chainscaling is hard today is that cloud providers have no mecha-nisms to actively manage the network resources available tothe chains For example it has been shown that EC2rsquos VMplacement algorithm is essentially random given the instancesize [32] As such it is quite possible that a new replicais launched behind a congested network link in which casethe bottleneck would not be overcome effectively We illus-trate this in Figure 3 where the IPS instance runs at 80triggering scaling but the added replica does not improveend-to-end performance because of network congestion atthe replicarsquos location

Furthermore it is important to allocate the amount of traf-fic going to different replicas in a manner that takes prevalentnetwork congestion into account and equally importantlyre-allocate as network conditions (as well as MB load) change

Figure 4 Ineffective flow allocation due to lack of visibility

otherwise the scaling decision may not have the desired ef-fect This is impossible to do in any effective manner todayas network utilization information is unavailable to tenantsWe show this in Figure 4 where N2 flows are sent overthe congested inter-rack link An optimal network-aware so-lution in this case would be to only send N6 flows on thecongested link

3 Stratos OVERVIEWOur vision is to enable the same degree of flexibility and

elasticity that we have with other aspects of cloud computationmdashvirtual computing virtual storagemdashto in-the-cloud MBs Inthis section we start with an overview of our system Stratosto address this challenge

At a high-level Stratos can be viewed as a network-awareorchestration layer layer that enables cloud tenants to eas-ily manage MB deployments in the cloud without any of thecomplexity discussed earlier We envision all of the needednetwork-aware functionality to enable flexibility and elastic-ity is implemented by the cloud provider

31 Stratos tenant interfaceInstead of composing middlebox and application server

topologies through a smattering of third-party tools and con-figurations tenants definelogical topologiesusing high-levelabstractions (Figure 5) These topologies are automaticallytransformed into a set of forwarding rules defining how ap-plication traffic flows between server and MB instances Indoing so Stratos abstracts away the physical realization ofhow many and where these MB functions are realized

Here we use the notion of achainas the basic abstractionfor describing the direction specific traffic flows should takeA chain begins with a source of traffic (eg Internet clients)contains a sequence of one or more middleboxes the trafficshould traverse (eg IDS and load balancer) and ends witha destination (eg a set of web servers) Each edge in achain is annotated with an expectedtraffic gaindropfactorthat specifies the ratio of input-to-output packets (bytes)oneach specific middlebox in the chain For instance a firewallmay drop packets and a RE module may compress pack-ets on the fly The traffic gain factors capture these effectssince they impact the amount of traffic that traverses linksbetween MBs A tenantrsquos topology could contain multiplechains with overlapping middleboxes

32 Stratos internals

3

Tenant logical topology with a single chain

Tenant logical topology with two overlapping chains

Figure 5Example tenant logical topologies

In mapping this logical view to an actual physical realiza-tion Stratos needs to address three key challenges with eachaddressed by a corresponding Stratos component as shown

bull Elastic ScalingHow many physical MB instances ofeach type need to be deployedThis module takes in as input the logical topology givenby the cloud tenant the tenantrsquos current physical con-figuration and any service-level requirement that thetenant desires (eg upper bounds on cost or lower boundson application latency) It uses periodic measurementsof the end-to-end application performance to decidethe optimal number of instances of different middle-boxes necessary to meet the given service requirement

bull PlacementWhere should these MBs be placed insidethe cloud providerrsquos networkThe placement module takes in as input the currentstate of the cloud providerrsquos physical network topology(eg available racks available slots available band-width between racks) the logical topology of the clientthe current physical instantiation of this topology acrossthe provider network and the number of new MBs ofdifferent types that need to be initiated Given theseinputs it decides where to place the new MBs to avoidnetwork bottlenecks As a special case it also imple-ments an initial placement interface which starts withzero MBs

bull Flow Distribution How should the traffic be routedthrough the different MBsThe distribution module takes as input a given physi-cal instantiation of a tenant chain (ie the number andplacement of the MBs) measured (or statically spec-ified) traffic gaindrop factors for the MBs and thecurrent network topology with link utilizations to op-timally distribute the processing load between the dif-ferent MBs The goal here is to reduce network con-gestion effects for the traffic flowing between MB in-stances as well as balance the CPUmemory utiliza-tion of MB instances

In designing the individual modules and in integrating themStratos takes into account both computational loads and network-level effects This helps ensure that the scaling step honesinon the true bottlenecks and that good placement and load

Figure 6Overview of the high-level functionality and inter-faces to the client and the cloud provider in Stratos to enableflexible middlebox deployment in the cloud

balancing are implemented for the current workload Thisalso ensures that there is sufficient capacity to efficientlyaddnew MBs in the future

More precisely when the scaling module decides to in-crease the number of MBs it invokes the network-awareplacement module to decide where the new MBs need to beplaced The placement module in turn calls the flow distri-bution module to decide the optimal distribution strategy forthe chosen placement that takes into account network-leveleffects As MB network footprints change the flow distri-bution module can redistribute load to further improve thechainrsquos end-to-end performance

33 Interacting with other Provider FunctionsIn order to achieve the network-aware orchestration we

need new management APIs to facilitate interaction betweenStratos and existing cloud functions Specifically Stratos in-teracts with the cloud providerrsquos monitoring and VM deploy-ment components as shown by the dotted arrows in Figure 6The interaction occurs at two different timescales (down-ward arrows) First on a coarse-grained timescale Stratosrsquosplacement logic may be invoked (left down arrow) whenevernetwork-wide management actions occur (eg VM migra-tion) Second the monitoring layer periodically reports linkutilizations to Stratosrsquos flow distribution module (right downarrow) If there is significant change in background (non-Stratos) network traffic the flow distribution module caninvoke redistributions across tenant chains Last Stratosrsquosplacement logic specifies constraints on the location of newMBs at the end of scaling or that of MBs and applicationVMs at chain initialization time to the cloud providerrsquos VMdeployment module (upward dotted arrow)

The focus of this paper is on the internal logic of Stratosie addressing the challenges highlighted in Section 32 Inthe next three sections we discuss the algorithmic frame-works underlying the above Stratos modules We do so in atop-down fashion starting with the application-aware scal-ing (sect4) followed by the rack-aware placement (sect5) andthe network-aware traffic distribution mechanism (sect6)

4 ELASTIC SCALINGThe ability to scale capacity as needed is a major benefit

of deploying applications in the cloud This means that the

4

chain traversed by application traffic must also be scaled toavoid becoming a performance bottleneck

To illustrate the difficulty in scaling tenant chains we startby considering several strawman approaches and discuss whythese solutions are ineffective Building on the insight that atenantrsquos ultimate concern is the end-to-end application per-formance we design a practical scaling heuristic for elasti-cally scaling a tenantrsquos chain

41 Strawman approachesWe considered several strawman approaches for deciding

which MBs to scale but they turned out to be ineffective

1 Scale all MB types1 The simplest solution for a bot-tlenecked chain is to add extra instances for each MBtype in the chain This guarantees the bottleneck willbe eliminated but it potentially wastes significant re-sources and imposes unneeded costs (especially whenonly one MB is bottlenecked)2

2 Per-packet processing timeThe average per-packetprocessing time at each MB provides a common middlebox-agnostic metric If a chain is bottlenecked the MBwith the greatest increase in per-packet processing timeis likely the culprit However not all MBs follow a onepacket in one packet out convention eg a WAN op-timizer and it is unclear if we can calculate a usefulper-packet processing time in this case

3 Offered load Alternatively we could leverage CPUand memory utilization or other load metrics (eg con-nectionssecond) However different types of MBshave different resource or functional bottlenecks [21]and these bottlenecks may vary with the workload it-self (eg a high redundancy workload may stress a REmodule more) Even if we set this aside this approachalong with 2 and 3 above is network-agnostic andcan lead to poor scaling decisions as we argued in Sec-tion 2

Another candidate benchmarking MB throughput offlineis also unsuitable since it is based on a fixed traffic mix achange in the traffic mix may cause the MB to bottleneck ata rate lower or higher than the benchmarked throughput InSection 8 we use 3 as an example to show that naive ap-proaches either identify the wrong bottleneck or take scalingdecisions that result in using 2X more MBs than needed

Ultimately a tenant is concerned with (i) the performanceof their applications and (ii ) the cost of running their de-ployments Together these motivate the need to scale thedeployment updown depending on an application-reportedperformance metric to minimize aggregate cost while ensur-ing acceptable performance Many cloud applications al-ready track such metrics for elastic scaling (eg requestsper second served) and could easily export them to Stratos

1ldquoMB typerdquo refers to a specific type of middlebox2Unless other specified we us ldquoMBrdquo to refer to a single instanceofa specific type of middlebox

scale_up_single(MboxArray M )1 for j isin [0 |M |]3 Do2 improves larr False

4 add_instance(M [j ])5 wait(Duration )6 foreachapp isin Apps 7 if PerfImprovement (app) gt thresh 5 improves larr True

6 if improves = False8 remove_instance(M [j ])9 while improves = True

Fallback scale all in chain simultaneously

scale_multiple(BottleneckedChains)10 foreachC isin Chains 11 Overlap larr SharedBottlenecks larr 12 foreachC prime 6= C isin Chains 13 if overlap(C C prime) Add C prime to OverLap

14 if Bottlenecked(C prime ) Add C prime to SharedBottlenecks15 if |Overlap| =016 scale_up_single(C mbs)17 else if|SharedBottlenecks | = 018 scale_up_single(unique_mbs(C Overlap ))19 else20 scale_up_single(shared_mbs(C )Overlap )

Fallback scale each chain sequentially

Figure 7 High-level sketch of the scaling heuristic inStratos For clarity we only show the common case opera-tion and highlight one possible fall back solution Note thatin the multi-chain case non-overlapping scaling trials canbe run in parallel

42 Application-Aware Scaling HeuristicWe design a heuristic approach that leverages an application-

reported metric for scaling tenant chains Our intuitive goalhere is to ensure that the application SLAs are met even ifit means erring on the conservative side and launching a fewmore instances than what is needed optimally The scalingprocess is triggered by a significant change in the perfor-mance of any of the applications in a tenant deployment fora sustained period of time (our prototype checks to see ifthere is sustained unmet demand or the average end-to-endlatency increases by 15 percent over a 30s interval) Wefirst describe the scaling process for a single chain and thenextend it to multiple chains The latter can be extended in astraightforward manner to scaling across multiple tenants

Single Chain Our heuristic performs a set of scaling tri-als scaling each MB type in a tenant-specified chain oneinstance at a time as shown in lines 1ndash9 in Figure 7 We it-erate through the chain and keep an added instances as longas we observe an improvement in the applicationsrsquo perfor-mance (in our prototype we look for a 15 improvement inthrough and unmet load dropping) Note that multiple ap-plications could share the same chain thus we look for animprovement in at least one such application (As an opti-mization we only need to look for improvement in bottle-necked applications) If we see no improvement then werevert the added instance and move to the next MB type in

5

the chain The scaling procedure terminates when we reachthe end of the chain or we see no more improvement

Scale down occurs in a similar fashion except that we lookfor demand drops our prototype checks if there is no unmetdemand and the applicationrsquos throughput drops by a certainpercentage over a 1 minute interval Our current prototypeselects replicas in increasing order of volume served to tryscaling them down (ie removing them) To prevent scaleupdown oscillations we use a ldquodampingrdquo factor and waitfor some time (our prototype uses 25 seconds) before re-attempting scaling

We make a practical choice here to scale one MB type at atime We view this as a reasonable choice because the scal-ing decision for a MB type (and indeed each scaling trial) isaccompanied by careful placement of scaled instances (Sec-tion 5) and redistribution of load across all MBs in the chain(Section 6) The placement and distribution steps help ad-dress network bottlenecks at downstream MBs

Nevertheless it is possible our scaling approach does notimprove application performance eg when two MB typesare equally bottlenecked by compute resources In such caseswe use a conservative fall back to the simplescale allap-proach and add new instances for all MB types in the chain

Multi-chain Topologies When a tenant has multiple chainsin their deployment we could consider running scaling trialsin parallel for each chain However MB types can be sharedacross chains and thus a scaling trial will influence the out-come of other concurrent trials and result in unnecessary orinadequate scaling

Another option is to scale each chain sequentially We usethis as a starting point and speed it up by identifying the setof overlappingchains for each bottlenecked chain

Our approach to scaling in multi-chain topologies is shownin lines 10ndash20 in Figure 7 In the simplest case if a bottle-necked chain shares no MB types then we simply run thesingle chain scaling procedure as discussed earlier (lines15ndash16) If one or more MB types overlaps with another chainand the overlapping chains are also bottlenecked then weguess that the common MB instances are the bottlenecks andonly run the scaling trial for these shared MB types (lines19ndash20) On the other hand if we have overlapping chainswith no bottlenecks then we speculate that the MB typesunique to the current chain are bottlenecked and focus onthese instead (lines 17ndash18) The intuition here is that iden-tifying sharedisolated chains allows us to zoom in on thebottlenecks faster In the case where this heuristic fails toimprove performance (eg chains C1 and C2 share MB typeM that is a bottleneck for C1 but not C2) we err on the sideof caution and adopt a conservative approach and rerun thescaling procedure considering the union of MBs across allthe chains in the setOverlap3

Network-awarenessSince each scale updown trial relies3This fall back requires a minimal amount of state at the Stratoscontroller to track whether it has recently attempted a scaling trialfor a given chain

on the end-to-end application performance metrics our ap-proach isimplicitly network-aware It may be possible todesign explicit approaches that combine monitoring CPUmemory and IO resources with utilization of the networklinks used by a tenantrsquos chain However it appears difficultto precisely identify bottlenecks in such a setting and moreimportantly to determine the extent to which they should bescaled to meet application performance goals We leave suchexplicit approaches as a subject for future work Neverthe-less our evaluation of this implicit scheme shows a lowerbound on the benefits of network-awareness in scaling (sect8)

Since our approach does not rely on VM-level measure-ments it can be applied to tenant deployments with arbitraryMBs In particular tenants can compose cloud provider-offered MBs with those from third-party vendors creatingdiverse chains

5 RACK-AWARE PLACEMENTThe bandwidth available on network links impacts several

aspects of tenant deployments Greater available networkbandwidth on the path to and from an MB means better useof the MBrsquos processing functionality Greater network-wideavailable bandwidth also translates to more effective scalingdecisions Together these imply better application perfor-mance per unit cost (a function of MBs in the chain) fora tenant Optimal use of network capacity also allows thecloud provider to help elastically scale more tenant chains

As such Stratos incorporates a placement module thatmaximizes the bandwidth available to a chain while alsocontrolling the chainrsquos network-wide footprint even as thechain scales elastically In what follows we describe algo-rithms for two aspects of placement initially mapping theMBs in a tenantrsquos topology and placing new MB instances

51 Initial PlacementInitial MB placement is triggered whenever a new tenant

arrives or network-wide management actions occur (egVM migration)

There are two main inputs we use for initial placement(1) The tenant-specified logical chains between MB typesand application VMs along with the number of physical in-stances of MB type or application VM Edges are annotatedwith the gaindrop factorfor each MB instance which isratio of the net traffic entering the MB versus that leavingit We assume the tenant estimates these based on prior his-tory or expected traffic patterns For example with an ex-pected 50 redundancy in traffic an RE MB would have againdrop factor of 2 (compressed traffic entering the MB isdecompressed) These factors serve as weights to the edgesin a chain And (2) the available slots across different racksand available bandwidth of different links in the data centertopology The latter is based on historical estimates (egmean maximum or kth percentile) of link utilizations Weassume a uniform distribution of load across all MBs of thesame type

6

While this is a simplistic model it still forms a helpfulbasis for placement (especially vis-a-vis existing naive VMplacement schemes that consider individual VMs in isola-tion See sect10) Given this the placement algorithm has threelogical stages

Partitioning First we partition the topology (entire graphcorresponding to a tenant) with the goal of placing each par-tition in its entirety on a single rack so that we incur minimalinter-rack communication That is we partition the tenantrsquostopology intoK partitions such that for each partition thereis at least one rack with enough available VM slots to ac-commodate the partition We adapt the classical min-K-cutalgorithm [28] to identify the partitions starting withK = 1and increasingK until all partitions are small enough to beaccommodated

Assigning partitions to racks The next stage is to assignracks for each partition Here we use a greedy approachthat proceeds by sorting pairs of partitions in the decreasingorder of the inter-partition communication For each pairif both partitions are unassigned to racks we find a pair ofracks with the highest available bandwidth to accommodatethese two partitions If one of the partitions in the pair isalready assigned to a rack then we simply find a new rackfor the unassigned partition (If both are assigned we simplymove to the next pair)

Assigning VMs to slots Last we assign VMs (ie MBsand application VMs) within each partition to slots in theracks In case there is just one slot per (physical) machinewe randomly pick a slot and assign it to a VM If there aremore available slots we follow a similar procedure to par-tition the VMs so that VMs that communicate more amongeach other can be assigned closer to each other

52 Placing New Middlebox InstancesNew MBs launched after scaling a chain need to be placed

efficiently for scaling to be effective Ideally the new MBplacement should also help support future scale up for boththe tenant in question as for other tenants Our heuristic isdriven by these goals

To more accurately account for the network interaction ofthe scaled MBs we dynamically track the gaindrop factorsfor MBs in the tenantrsquos topology based on prevalent traf-fic patterns at each MB (using EWMA) Placement of thescaled MB considers the estimated ratios for the flows fromMBrsquos input and output VMs (those supplying traffic to andreceiving from the MB respectively) as input Placementthen works as follows

If the new instance can be accommodated in the same rackas its input MBs (or VMs) and output MBs (or VMs) then weplace the new instance in the same rack However if the newinstance cannot be accommodated in the same rack we se-lect a candidate rack (rack with free slots) that has the max-imum available bandwidth to the rack for input and outputMBs When the input and output MBs are in different rackswe consider each candidate rack and estimate the inter-rack

MB traffic using network-aware flow distribution (discussedin the next section) assuming that the new MB is placed inthe candidate rack We select the rack that minimizes theweighted sum of inter-rack flows (or maximizes the band-width available to inter-rack flows)

6 NETWORK-AWAREFLOW DISTRIBUTION

Akin to placement Stratosrsquos flow distribution module ac-tively manages how MBs use network capacity In contrastwith placement however flow distribution can be invoked atfine time-scales

Flow distribution is triggered whenever a scale updowndecision is made In particular the new instance placementheuristic in Section 52 invokes flow distribution when con-sidering the optimal location for the scaled instance Flowre-distribution can be triggered whenever the gain factor ofa MB instance changes significantly (eg from 2 to 1 forthe RE MB in sect5) Stratos periodically monitors each chainfor such changes Finally based on periodic input about net-work utilization from the cloud providerrsquos monitoring func-tionality flow re-distribution can be triggered across multi-ple tenant chains in response to changes in background (non-Stratos) network traffic This helps maximize the bandwidthavailable for intra-chain communications and improves ten-antsrsquo application performance The latter two re-distributionattempts happen at the same periodicity in our prototype

In essence flow distribution helps provide fine-grainedoptimization of chain performance as well as control overchain network footprint for a given physical deployment ofthe chain The key here is that we need to adjust traffic acrossthe entire set of chains of a tenant as focusing just on thescaled instance may result in less-than-ideal improvementsin tenant applicationsrsquo performance

Figure 8 Example tenant topology to explain the terms inthe LP framework for network-aware distribution For clar-ity we do not show the gain factors on the edges

Next we describe a systematic linear-programming (LP)based framework that formally captures the problem of network-aware flow distribution As such the logic we describe hereis generaland applies to multiple scenarios in which suchflow distribution is invoked for instance the common caseis when the distribution module is triggered as a result ofelastic scaling The module may also be triggered due tochanges in the background traffic as well has changes in the

7

gain factors for different MBs in a chain as a result of work-load changes for a given tenant Furthermore this logic eas-ily extends to the multi-tenant scenario with multiple chainsper tenant we simply consider the union of all chains acrossall tenants

Notation Let c denote a specific chain andVc be the to-tal volume (flows) of traffic that require processing via thischain There may be different types of MBs (ie IDS RE)within a chain|c| is the number of MBs in a given chainc Let c[j ] be the type of the middlebox that is at positionj in the chainc (eg IDS RE) Letk denote the type of amiddlebox andMk be the set of MB instances of typek thatthe scaling module has launched ThusMc[j ] is the set ofMB instances of typec[j ] we usei isin Mc[j ] to specify thata MB instancei belongs to this type Figure 8 gives a quickoverview of the different entities involved in this formula-tion

LP Formulation Our goal is to split the traffic across theinstances of each type such that (a) the processing respon-sibilities are distributed roughly equally across them and(b)the aggregatenetwork footprintis minimized Thus we needto determine how the traffic is routed between different MBsLet f (c i i

prime

) denote the volume of traffic in chainc beingrouted from middleboxi to the instancei

prime

(see Figure 8)As a special casef (c i) denotes traffic routed to the firstmiddlebox in a chain from a source element4

Suppose each unit traffic of flowing between a pair of in-stances incurs some network-level costCost(i rarr i prime) de-notes the network-level cost between two instances In thesimplest case this is a binary variablemdash1 if the two MBsare in different racks and 0 otherwise (We can use more ad-vanced measures to capture latency or available bandwidthas well)

Given this setup Figure 9 formalizes the network-awareflow distribution problem that Stratos solves Here Eq (1)captures the network-wide footprint of routing traffic be-tween potential instances of thej th MB in a chain to thej + 1th MB in that chain For completeness we consider allpossible combinations of routing traffic from one instance toanother In practice the optimization will prefer only com-binations that have low footprints

Eq (2) models aflow conservationprinciple For eachchain and for each position in the chain the volume of trafficentering the middlebox has to be equal to the volume exitingit to the next middlebox type in the sequence Since middle-boxes may change the aggregate volume (eg a firewall maydrop traffic or RE may compress traffic) we consider a gen-eralized notion of conservation that also takes into accountthe expected gaindrop factorγ(c j ) which is the ratio ofincoming-to-outgoing traffic at the positionj for the chaincFor initial placement we expect the tenant to provide thesefactors as annotations to the logical topology specification4For clarity we focus only on the forward direction of the chainnoting that our implementation uses an extended formulation thatcaptures bidirectional chains as well

Minimize

sum

c

|c|minus1sum

j=1

sum

iiprime

st

iisinMc[j ]iprime

isinMc[j+1]

Cost(i iprime

)times f (c i iprime

) (1)

subject to

foralli forallc s t i isin Mc[j ] amp j gt 1 sum

iprime iprimeisinMc[jminus1]

f (c iprime

i) =sum

iprime iprimeisinMc[j+1]

f (c i iprime

)times γ(c j )

(2)

forallc sum

iiisinMc[1]

f (c i) = Vc (3)

foralli sum

ciisinMc[j ]j 6=1

sum

iprime

iprime

isinMc[jminus1]

f (c iprime

i)

+sum

ciisinMc[1]

f (c i) asympsum

ciisinciisinMc[j ]

Vc

|Mc [j ]|timesΠj

l=1γ(c l)

(4)

Figure 9 LP formulation for the network-aware flow dis-tribution problem Theasymp term in the last equation simplyrepresents that we have some leeway in allowing the load tobe within 10ndash20 of the mean

the tenant could derived these based on expected traffic pat-terns or history Stratos periodically recomputes these gainfactors based on the observed input-output ratios for eachchain

In addition to this flow conservation we also need to en-sure that each chainrsquos aggregate traffic will be processedthus we also model thiscoverageconstraint in Eq (3) Fi-nally we want to ensure that within each middlebox typethe load is roughly evenly distributed across the instancesofthat type in Eq (4) Here we use a general notion of loadbalancing where we can allow for some leeway say within10-20 of the targeted average load

We must ensure that the periodic flow redistributions andflow distribution accompanying scaling donrsquot enter into raceconditions We take two steps for this First any scalingattempt in a chain is preceded by a redistribution first Onlyif redistribution does not suffice does Stratos initial scalingtrials Second Stratos suspends all redistributions during thetime when scaling trials are being run across a given tenantrsquosdeployment

7 IMPLEMENTATIONWe have implemented a full featured Stratos prototype ca-

pable of running on commodity x86-64 hardware Figure 10shows an overview of the components involved

Stratos Data PlaneThe Stratos data plane is a configurableoverlay network realized through packet encapsulation andprogrammable software switches Each tenant VM has a

8

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 2: Stratos: A Network-Aware Orchestration Layer for

exports an annotated logical topology view to tenants wherethe annotations are hints on MB network footprint Stratosrsquosmanagement plane implements efficient algorithms to mapthe logical view to an appropriate physical realization

Stratosrsquos management plane also automatically and accu-rately determines the bottleneck for a tenantrsquos deploymentusing anapplication-aware heuristicthat relies on applica-tion reported performance measures (sect4) The heuristic im-plicitly takes into account MBsrsquo holistic resource consump-tion including compute memory and the network

Finally Stratos explicitly manages the network interac-tions of MBs in order to maximize the network capacity be-tween them Specifically the management plane implementstwo functions that both take profiles of MB network foot-print and logical MB topologies as input (i) A placementalgorithm that logically partitions the physical MB topol-ogy into per-rack partitions and places them with minimalinter-partition communication (sect5) (ii ) A traffic distributionalgorithm to route traffic across the different MBsreplicasthat further reduces the network footprint (sect6) Placementis triggered when a new tenant arrives scaling decisions aremade or network-wide management actions occur (eg VMmigration) Traffic distribution is invoked periodically tore-balance traffic based on changing MB network footprintchanging network load from other tenants or a placementdecision

We implement Stratos as a collection of modules runningatop Floodlight [6] (asymp7500 LOC) These modules (i) parsetenant chain configuration files (ii ) gather performance met-rics from network switches applications and MBs usingSNMP (iii ) execute Stratosrsquos scaling placement and flowdistribution algorithms (iv) launch and terminate VMs us-ing Xen [15] and (v) install forwarding rules in hypervisor-resident Open vSwitches [8]

We conduct controlled experiments of our prototype overa 24 node72 VM data center testbed We also evaluate astripped down Stratos for EC2 which only implements ourscaling and load distribution heuristics Finally we conductsimulations to study Stratosrsquos impact at scale

Our central goal is to verify the importance of network-awareness embedded into Stratos be it in scaling placementor distribution in supporting MB services in the most effec-tive fashion To this end we find

bull Stratos helps optimally meet application demand byaccurately identifying bottlenecks and either adding theappropriate number of MB replicas or redistributingtraffic at coarse and fine timescales to overcome con-gestion

bull Network-agnostic approaches use up to 2X as manyMBs as Stratos yet they cannot meet application de-mand resulting in severely backlogged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2

Figure 1Example middlebox and server topology

Stratos imposes little setup overhead Stratosrsquos fine-grained load distribution plays a crucial role in sustain-ing application performance despite changing networkconditions

2 BACKGROUNDMBs play a key role in enterprises and private data cen-

ters [33] with application traffic often traversing multipleMB appliances With enterprises migrating their applica-tions to the cloud a wide-variety of cloud-provided services(eg Amazonrsquos Elastic Load Balancer [2]) and third-partyVM images [3 12 13] have emerged to supply the desiredMB functionality In fact recent surveys show that 87of IT professionals believe that network-level MB servicesshould be a key part of Cloud-based IaaS offerings [1]

In this section we describe typical approaches used todayto leverage MBs in the cloud and show that due to lack ofsuitable abstractions and intrinsic management functionalitythese approaches offer limited to no flexibility and impedeelastic scaling Our observations are derived on the basis ofour own experience in trying to deploy such network ser-vices in Amazon EC2 [2]

Composition In contrast to traditional compute applica-tions network services are frequently deployed as a ldquochainrdquoof several MBs [24] For example traffic may enter thedata center through a WAN optimizer or redundancy elim-ination (RE) MB be mirrored to an intrusion detection sys-tem (IDS) directed to a load balancer and assigned to oneof several application servers (Figure 1)

Since todayrsquos cloud providers are largely geared towardtraditional applications they provide little control over net-work topology and routing [2 10] and third-party overlayservices [14] only facilitate topologies containing directlyaddressed endpoints (in contrast MBs should frequently betransparent) As a result tenants are forced to run MBs asgeneric VMs and manually piece together tunnels trafficsplitters and other software to route the desired traffic Suchmanual and distributed configuration makes it hard to dy-namically add new functionality add replicas MBs to man-age load or route around failed MBs As an anecdote im-plementing the relatively simple set of MB traversals shownin Figure 1 required several days of trial-and-error to obtaina working setup in EC2 which relied on several third-partytools and configurations strewn across VMs

Automation scripts are insufficient since they make dy-namic changes possible but not easy Indeed the tenant stillhas to implement extra logic - eg to distribute appropri-ate traffic subsets to MB replicas - which may change whennew type of MBs are deployed in a chain (eg transcodingor compression engines which change expected load) More

2

Figure 2 Lack of scaling due to network bottlenecks

Figure 3 Ineffective scaling due to poor placement

importantly the implemented logic may be fundamentallyinsufficient due to lack of intrinsic support from the cloudprovider We highlight this next

Elastic Scaling Being on the critical forwarding pathMBsrsquo performance and network footprint can significantlyimpact end-to-end application performance Unfortunatelythere are no effective schemes today to identify bottlenecksin and elastically scale MB chains This is because existingapproaches [11 4] do not recognize the chain as an entityand there are no intrinsic mechanisms to help control MBchain performance

Given todayrsquos compute-centric view tenants could moni-tor basic resource consumption (CPU memory IO) to iden-tify if individual MBs are bottlenecks Unfortunately thismay not be sufficient because the bottleneck may be a net-work link on the path between two MBs in a chain Whilenetwork bottlenecks also impact regular cloud applicationsthese effects get magnified in the context of MBs becausethey lie on the critical forwarding path We illustrate thisinFigure 2 where the IPS and RE MBs run at 50 utilizationand hence no scaling is triggered Yet the applicationrsquos per-formance which is bottlenecked by the congested link canbe improved by adding an RE instance (outside rack-2) andsending some part of the traffic to it In general unless theperformance constraints imposed by all elements in a chain- MBs and network links alike - are taken in account bottle-necks cannot be identifiedovercome effectively

One of the key reasons that effective elastic MB chainscaling is hard today is that cloud providers have no mecha-nisms to actively manage the network resources available tothe chains For example it has been shown that EC2rsquos VMplacement algorithm is essentially random given the instancesize [32] As such it is quite possible that a new replicais launched behind a congested network link in which casethe bottleneck would not be overcome effectively We illus-trate this in Figure 3 where the IPS instance runs at 80triggering scaling but the added replica does not improveend-to-end performance because of network congestion atthe replicarsquos location

Furthermore it is important to allocate the amount of traf-fic going to different replicas in a manner that takes prevalentnetwork congestion into account and equally importantlyre-allocate as network conditions (as well as MB load) change

Figure 4 Ineffective flow allocation due to lack of visibility

otherwise the scaling decision may not have the desired ef-fect This is impossible to do in any effective manner todayas network utilization information is unavailable to tenantsWe show this in Figure 4 where N2 flows are sent overthe congested inter-rack link An optimal network-aware so-lution in this case would be to only send N6 flows on thecongested link

3 Stratos OVERVIEWOur vision is to enable the same degree of flexibility and

elasticity that we have with other aspects of cloud computationmdashvirtual computing virtual storagemdashto in-the-cloud MBs Inthis section we start with an overview of our system Stratosto address this challenge

At a high-level Stratos can be viewed as a network-awareorchestration layer layer that enables cloud tenants to eas-ily manage MB deployments in the cloud without any of thecomplexity discussed earlier We envision all of the needednetwork-aware functionality to enable flexibility and elastic-ity is implemented by the cloud provider

31 Stratos tenant interfaceInstead of composing middlebox and application server

topologies through a smattering of third-party tools and con-figurations tenants definelogical topologiesusing high-levelabstractions (Figure 5) These topologies are automaticallytransformed into a set of forwarding rules defining how ap-plication traffic flows between server and MB instances Indoing so Stratos abstracts away the physical realization ofhow many and where these MB functions are realized

Here we use the notion of achainas the basic abstractionfor describing the direction specific traffic flows should takeA chain begins with a source of traffic (eg Internet clients)contains a sequence of one or more middleboxes the trafficshould traverse (eg IDS and load balancer) and ends witha destination (eg a set of web servers) Each edge in achain is annotated with an expectedtraffic gaindropfactorthat specifies the ratio of input-to-output packets (bytes)oneach specific middlebox in the chain For instance a firewallmay drop packets and a RE module may compress pack-ets on the fly The traffic gain factors capture these effectssince they impact the amount of traffic that traverses linksbetween MBs A tenantrsquos topology could contain multiplechains with overlapping middleboxes

32 Stratos internals

3

Tenant logical topology with a single chain

Tenant logical topology with two overlapping chains

Figure 5Example tenant logical topologies

In mapping this logical view to an actual physical realiza-tion Stratos needs to address three key challenges with eachaddressed by a corresponding Stratos component as shown

bull Elastic ScalingHow many physical MB instances ofeach type need to be deployedThis module takes in as input the logical topology givenby the cloud tenant the tenantrsquos current physical con-figuration and any service-level requirement that thetenant desires (eg upper bounds on cost or lower boundson application latency) It uses periodic measurementsof the end-to-end application performance to decidethe optimal number of instances of different middle-boxes necessary to meet the given service requirement

bull PlacementWhere should these MBs be placed insidethe cloud providerrsquos networkThe placement module takes in as input the currentstate of the cloud providerrsquos physical network topology(eg available racks available slots available band-width between racks) the logical topology of the clientthe current physical instantiation of this topology acrossthe provider network and the number of new MBs ofdifferent types that need to be initiated Given theseinputs it decides where to place the new MBs to avoidnetwork bottlenecks As a special case it also imple-ments an initial placement interface which starts withzero MBs

bull Flow Distribution How should the traffic be routedthrough the different MBsThe distribution module takes as input a given physi-cal instantiation of a tenant chain (ie the number andplacement of the MBs) measured (or statically spec-ified) traffic gaindrop factors for the MBs and thecurrent network topology with link utilizations to op-timally distribute the processing load between the dif-ferent MBs The goal here is to reduce network con-gestion effects for the traffic flowing between MB in-stances as well as balance the CPUmemory utiliza-tion of MB instances

In designing the individual modules and in integrating themStratos takes into account both computational loads and network-level effects This helps ensure that the scaling step honesinon the true bottlenecks and that good placement and load

Figure 6Overview of the high-level functionality and inter-faces to the client and the cloud provider in Stratos to enableflexible middlebox deployment in the cloud

balancing are implemented for the current workload Thisalso ensures that there is sufficient capacity to efficientlyaddnew MBs in the future

More precisely when the scaling module decides to in-crease the number of MBs it invokes the network-awareplacement module to decide where the new MBs need to beplaced The placement module in turn calls the flow distri-bution module to decide the optimal distribution strategy forthe chosen placement that takes into account network-leveleffects As MB network footprints change the flow distri-bution module can redistribute load to further improve thechainrsquos end-to-end performance

33 Interacting with other Provider FunctionsIn order to achieve the network-aware orchestration we

need new management APIs to facilitate interaction betweenStratos and existing cloud functions Specifically Stratos in-teracts with the cloud providerrsquos monitoring and VM deploy-ment components as shown by the dotted arrows in Figure 6The interaction occurs at two different timescales (down-ward arrows) First on a coarse-grained timescale Stratosrsquosplacement logic may be invoked (left down arrow) whenevernetwork-wide management actions occur (eg VM migra-tion) Second the monitoring layer periodically reports linkutilizations to Stratosrsquos flow distribution module (right downarrow) If there is significant change in background (non-Stratos) network traffic the flow distribution module caninvoke redistributions across tenant chains Last Stratosrsquosplacement logic specifies constraints on the location of newMBs at the end of scaling or that of MBs and applicationVMs at chain initialization time to the cloud providerrsquos VMdeployment module (upward dotted arrow)

The focus of this paper is on the internal logic of Stratosie addressing the challenges highlighted in Section 32 Inthe next three sections we discuss the algorithmic frame-works underlying the above Stratos modules We do so in atop-down fashion starting with the application-aware scal-ing (sect4) followed by the rack-aware placement (sect5) andthe network-aware traffic distribution mechanism (sect6)

4 ELASTIC SCALINGThe ability to scale capacity as needed is a major benefit

of deploying applications in the cloud This means that the

4

chain traversed by application traffic must also be scaled toavoid becoming a performance bottleneck

To illustrate the difficulty in scaling tenant chains we startby considering several strawman approaches and discuss whythese solutions are ineffective Building on the insight that atenantrsquos ultimate concern is the end-to-end application per-formance we design a practical scaling heuristic for elasti-cally scaling a tenantrsquos chain

41 Strawman approachesWe considered several strawman approaches for deciding

which MBs to scale but they turned out to be ineffective

1 Scale all MB types1 The simplest solution for a bot-tlenecked chain is to add extra instances for each MBtype in the chain This guarantees the bottleneck willbe eliminated but it potentially wastes significant re-sources and imposes unneeded costs (especially whenonly one MB is bottlenecked)2

2 Per-packet processing timeThe average per-packetprocessing time at each MB provides a common middlebox-agnostic metric If a chain is bottlenecked the MBwith the greatest increase in per-packet processing timeis likely the culprit However not all MBs follow a onepacket in one packet out convention eg a WAN op-timizer and it is unclear if we can calculate a usefulper-packet processing time in this case

3 Offered load Alternatively we could leverage CPUand memory utilization or other load metrics (eg con-nectionssecond) However different types of MBshave different resource or functional bottlenecks [21]and these bottlenecks may vary with the workload it-self (eg a high redundancy workload may stress a REmodule more) Even if we set this aside this approachalong with 2 and 3 above is network-agnostic andcan lead to poor scaling decisions as we argued in Sec-tion 2

Another candidate benchmarking MB throughput offlineis also unsuitable since it is based on a fixed traffic mix achange in the traffic mix may cause the MB to bottleneck ata rate lower or higher than the benchmarked throughput InSection 8 we use 3 as an example to show that naive ap-proaches either identify the wrong bottleneck or take scalingdecisions that result in using 2X more MBs than needed

Ultimately a tenant is concerned with (i) the performanceof their applications and (ii ) the cost of running their de-ployments Together these motivate the need to scale thedeployment updown depending on an application-reportedperformance metric to minimize aggregate cost while ensur-ing acceptable performance Many cloud applications al-ready track such metrics for elastic scaling (eg requestsper second served) and could easily export them to Stratos

1ldquoMB typerdquo refers to a specific type of middlebox2Unless other specified we us ldquoMBrdquo to refer to a single instanceofa specific type of middlebox

scale_up_single(MboxArray M )1 for j isin [0 |M |]3 Do2 improves larr False

4 add_instance(M [j ])5 wait(Duration )6 foreachapp isin Apps 7 if PerfImprovement (app) gt thresh 5 improves larr True

6 if improves = False8 remove_instance(M [j ])9 while improves = True

Fallback scale all in chain simultaneously

scale_multiple(BottleneckedChains)10 foreachC isin Chains 11 Overlap larr SharedBottlenecks larr 12 foreachC prime 6= C isin Chains 13 if overlap(C C prime) Add C prime to OverLap

14 if Bottlenecked(C prime ) Add C prime to SharedBottlenecks15 if |Overlap| =016 scale_up_single(C mbs)17 else if|SharedBottlenecks | = 018 scale_up_single(unique_mbs(C Overlap ))19 else20 scale_up_single(shared_mbs(C )Overlap )

Fallback scale each chain sequentially

Figure 7 High-level sketch of the scaling heuristic inStratos For clarity we only show the common case opera-tion and highlight one possible fall back solution Note thatin the multi-chain case non-overlapping scaling trials canbe run in parallel

42 Application-Aware Scaling HeuristicWe design a heuristic approach that leverages an application-

reported metric for scaling tenant chains Our intuitive goalhere is to ensure that the application SLAs are met even ifit means erring on the conservative side and launching a fewmore instances than what is needed optimally The scalingprocess is triggered by a significant change in the perfor-mance of any of the applications in a tenant deployment fora sustained period of time (our prototype checks to see ifthere is sustained unmet demand or the average end-to-endlatency increases by 15 percent over a 30s interval) Wefirst describe the scaling process for a single chain and thenextend it to multiple chains The latter can be extended in astraightforward manner to scaling across multiple tenants

Single Chain Our heuristic performs a set of scaling tri-als scaling each MB type in a tenant-specified chain oneinstance at a time as shown in lines 1ndash9 in Figure 7 We it-erate through the chain and keep an added instances as longas we observe an improvement in the applicationsrsquo perfor-mance (in our prototype we look for a 15 improvement inthrough and unmet load dropping) Note that multiple ap-plications could share the same chain thus we look for animprovement in at least one such application (As an opti-mization we only need to look for improvement in bottle-necked applications) If we see no improvement then werevert the added instance and move to the next MB type in

5

the chain The scaling procedure terminates when we reachthe end of the chain or we see no more improvement

Scale down occurs in a similar fashion except that we lookfor demand drops our prototype checks if there is no unmetdemand and the applicationrsquos throughput drops by a certainpercentage over a 1 minute interval Our current prototypeselects replicas in increasing order of volume served to tryscaling them down (ie removing them) To prevent scaleupdown oscillations we use a ldquodampingrdquo factor and waitfor some time (our prototype uses 25 seconds) before re-attempting scaling

We make a practical choice here to scale one MB type at atime We view this as a reasonable choice because the scal-ing decision for a MB type (and indeed each scaling trial) isaccompanied by careful placement of scaled instances (Sec-tion 5) and redistribution of load across all MBs in the chain(Section 6) The placement and distribution steps help ad-dress network bottlenecks at downstream MBs

Nevertheless it is possible our scaling approach does notimprove application performance eg when two MB typesare equally bottlenecked by compute resources In such caseswe use a conservative fall back to the simplescale allap-proach and add new instances for all MB types in the chain

Multi-chain Topologies When a tenant has multiple chainsin their deployment we could consider running scaling trialsin parallel for each chain However MB types can be sharedacross chains and thus a scaling trial will influence the out-come of other concurrent trials and result in unnecessary orinadequate scaling

Another option is to scale each chain sequentially We usethis as a starting point and speed it up by identifying the setof overlappingchains for each bottlenecked chain

Our approach to scaling in multi-chain topologies is shownin lines 10ndash20 in Figure 7 In the simplest case if a bottle-necked chain shares no MB types then we simply run thesingle chain scaling procedure as discussed earlier (lines15ndash16) If one or more MB types overlaps with another chainand the overlapping chains are also bottlenecked then weguess that the common MB instances are the bottlenecks andonly run the scaling trial for these shared MB types (lines19ndash20) On the other hand if we have overlapping chainswith no bottlenecks then we speculate that the MB typesunique to the current chain are bottlenecked and focus onthese instead (lines 17ndash18) The intuition here is that iden-tifying sharedisolated chains allows us to zoom in on thebottlenecks faster In the case where this heuristic fails toimprove performance (eg chains C1 and C2 share MB typeM that is a bottleneck for C1 but not C2) we err on the sideof caution and adopt a conservative approach and rerun thescaling procedure considering the union of MBs across allthe chains in the setOverlap3

Network-awarenessSince each scale updown trial relies3This fall back requires a minimal amount of state at the Stratoscontroller to track whether it has recently attempted a scaling trialfor a given chain

on the end-to-end application performance metrics our ap-proach isimplicitly network-aware It may be possible todesign explicit approaches that combine monitoring CPUmemory and IO resources with utilization of the networklinks used by a tenantrsquos chain However it appears difficultto precisely identify bottlenecks in such a setting and moreimportantly to determine the extent to which they should bescaled to meet application performance goals We leave suchexplicit approaches as a subject for future work Neverthe-less our evaluation of this implicit scheme shows a lowerbound on the benefits of network-awareness in scaling (sect8)

Since our approach does not rely on VM-level measure-ments it can be applied to tenant deployments with arbitraryMBs In particular tenants can compose cloud provider-offered MBs with those from third-party vendors creatingdiverse chains

5 RACK-AWARE PLACEMENTThe bandwidth available on network links impacts several

aspects of tenant deployments Greater available networkbandwidth on the path to and from an MB means better useof the MBrsquos processing functionality Greater network-wideavailable bandwidth also translates to more effective scalingdecisions Together these imply better application perfor-mance per unit cost (a function of MBs in the chain) fora tenant Optimal use of network capacity also allows thecloud provider to help elastically scale more tenant chains

As such Stratos incorporates a placement module thatmaximizes the bandwidth available to a chain while alsocontrolling the chainrsquos network-wide footprint even as thechain scales elastically In what follows we describe algo-rithms for two aspects of placement initially mapping theMBs in a tenantrsquos topology and placing new MB instances

51 Initial PlacementInitial MB placement is triggered whenever a new tenant

arrives or network-wide management actions occur (egVM migration)

There are two main inputs we use for initial placement(1) The tenant-specified logical chains between MB typesand application VMs along with the number of physical in-stances of MB type or application VM Edges are annotatedwith the gaindrop factorfor each MB instance which isratio of the net traffic entering the MB versus that leavingit We assume the tenant estimates these based on prior his-tory or expected traffic patterns For example with an ex-pected 50 redundancy in traffic an RE MB would have againdrop factor of 2 (compressed traffic entering the MB isdecompressed) These factors serve as weights to the edgesin a chain And (2) the available slots across different racksand available bandwidth of different links in the data centertopology The latter is based on historical estimates (egmean maximum or kth percentile) of link utilizations Weassume a uniform distribution of load across all MBs of thesame type

6

While this is a simplistic model it still forms a helpfulbasis for placement (especially vis-a-vis existing naive VMplacement schemes that consider individual VMs in isola-tion See sect10) Given this the placement algorithm has threelogical stages

Partitioning First we partition the topology (entire graphcorresponding to a tenant) with the goal of placing each par-tition in its entirety on a single rack so that we incur minimalinter-rack communication That is we partition the tenantrsquostopology intoK partitions such that for each partition thereis at least one rack with enough available VM slots to ac-commodate the partition We adapt the classical min-K-cutalgorithm [28] to identify the partitions starting withK = 1and increasingK until all partitions are small enough to beaccommodated

Assigning partitions to racks The next stage is to assignracks for each partition Here we use a greedy approachthat proceeds by sorting pairs of partitions in the decreasingorder of the inter-partition communication For each pairif both partitions are unassigned to racks we find a pair ofracks with the highest available bandwidth to accommodatethese two partitions If one of the partitions in the pair isalready assigned to a rack then we simply find a new rackfor the unassigned partition (If both are assigned we simplymove to the next pair)

Assigning VMs to slots Last we assign VMs (ie MBsand application VMs) within each partition to slots in theracks In case there is just one slot per (physical) machinewe randomly pick a slot and assign it to a VM If there aremore available slots we follow a similar procedure to par-tition the VMs so that VMs that communicate more amongeach other can be assigned closer to each other

52 Placing New Middlebox InstancesNew MBs launched after scaling a chain need to be placed

efficiently for scaling to be effective Ideally the new MBplacement should also help support future scale up for boththe tenant in question as for other tenants Our heuristic isdriven by these goals

To more accurately account for the network interaction ofthe scaled MBs we dynamically track the gaindrop factorsfor MBs in the tenantrsquos topology based on prevalent traf-fic patterns at each MB (using EWMA) Placement of thescaled MB considers the estimated ratios for the flows fromMBrsquos input and output VMs (those supplying traffic to andreceiving from the MB respectively) as input Placementthen works as follows

If the new instance can be accommodated in the same rackas its input MBs (or VMs) and output MBs (or VMs) then weplace the new instance in the same rack However if the newinstance cannot be accommodated in the same rack we se-lect a candidate rack (rack with free slots) that has the max-imum available bandwidth to the rack for input and outputMBs When the input and output MBs are in different rackswe consider each candidate rack and estimate the inter-rack

MB traffic using network-aware flow distribution (discussedin the next section) assuming that the new MB is placed inthe candidate rack We select the rack that minimizes theweighted sum of inter-rack flows (or maximizes the band-width available to inter-rack flows)

6 NETWORK-AWAREFLOW DISTRIBUTION

Akin to placement Stratosrsquos flow distribution module ac-tively manages how MBs use network capacity In contrastwith placement however flow distribution can be invoked atfine time-scales

Flow distribution is triggered whenever a scale updowndecision is made In particular the new instance placementheuristic in Section 52 invokes flow distribution when con-sidering the optimal location for the scaled instance Flowre-distribution can be triggered whenever the gain factor ofa MB instance changes significantly (eg from 2 to 1 forthe RE MB in sect5) Stratos periodically monitors each chainfor such changes Finally based on periodic input about net-work utilization from the cloud providerrsquos monitoring func-tionality flow re-distribution can be triggered across multi-ple tenant chains in response to changes in background (non-Stratos) network traffic This helps maximize the bandwidthavailable for intra-chain communications and improves ten-antsrsquo application performance The latter two re-distributionattempts happen at the same periodicity in our prototype

In essence flow distribution helps provide fine-grainedoptimization of chain performance as well as control overchain network footprint for a given physical deployment ofthe chain The key here is that we need to adjust traffic acrossthe entire set of chains of a tenant as focusing just on thescaled instance may result in less-than-ideal improvementsin tenant applicationsrsquo performance

Figure 8 Example tenant topology to explain the terms inthe LP framework for network-aware distribution For clar-ity we do not show the gain factors on the edges

Next we describe a systematic linear-programming (LP)based framework that formally captures the problem of network-aware flow distribution As such the logic we describe hereis generaland applies to multiple scenarios in which suchflow distribution is invoked for instance the common caseis when the distribution module is triggered as a result ofelastic scaling The module may also be triggered due tochanges in the background traffic as well has changes in the

7

gain factors for different MBs in a chain as a result of work-load changes for a given tenant Furthermore this logic eas-ily extends to the multi-tenant scenario with multiple chainsper tenant we simply consider the union of all chains acrossall tenants

Notation Let c denote a specific chain andVc be the to-tal volume (flows) of traffic that require processing via thischain There may be different types of MBs (ie IDS RE)within a chain|c| is the number of MBs in a given chainc Let c[j ] be the type of the middlebox that is at positionj in the chainc (eg IDS RE) Letk denote the type of amiddlebox andMk be the set of MB instances of typek thatthe scaling module has launched ThusMc[j ] is the set ofMB instances of typec[j ] we usei isin Mc[j ] to specify thata MB instancei belongs to this type Figure 8 gives a quickoverview of the different entities involved in this formula-tion

LP Formulation Our goal is to split the traffic across theinstances of each type such that (a) the processing respon-sibilities are distributed roughly equally across them and(b)the aggregatenetwork footprintis minimized Thus we needto determine how the traffic is routed between different MBsLet f (c i i

prime

) denote the volume of traffic in chainc beingrouted from middleboxi to the instancei

prime

(see Figure 8)As a special casef (c i) denotes traffic routed to the firstmiddlebox in a chain from a source element4

Suppose each unit traffic of flowing between a pair of in-stances incurs some network-level costCost(i rarr i prime) de-notes the network-level cost between two instances In thesimplest case this is a binary variablemdash1 if the two MBsare in different racks and 0 otherwise (We can use more ad-vanced measures to capture latency or available bandwidthas well)

Given this setup Figure 9 formalizes the network-awareflow distribution problem that Stratos solves Here Eq (1)captures the network-wide footprint of routing traffic be-tween potential instances of thej th MB in a chain to thej + 1th MB in that chain For completeness we consider allpossible combinations of routing traffic from one instance toanother In practice the optimization will prefer only com-binations that have low footprints

Eq (2) models aflow conservationprinciple For eachchain and for each position in the chain the volume of trafficentering the middlebox has to be equal to the volume exitingit to the next middlebox type in the sequence Since middle-boxes may change the aggregate volume (eg a firewall maydrop traffic or RE may compress traffic) we consider a gen-eralized notion of conservation that also takes into accountthe expected gaindrop factorγ(c j ) which is the ratio ofincoming-to-outgoing traffic at the positionj for the chaincFor initial placement we expect the tenant to provide thesefactors as annotations to the logical topology specification4For clarity we focus only on the forward direction of the chainnoting that our implementation uses an extended formulation thatcaptures bidirectional chains as well

Minimize

sum

c

|c|minus1sum

j=1

sum

iiprime

st

iisinMc[j ]iprime

isinMc[j+1]

Cost(i iprime

)times f (c i iprime

) (1)

subject to

foralli forallc s t i isin Mc[j ] amp j gt 1 sum

iprime iprimeisinMc[jminus1]

f (c iprime

i) =sum

iprime iprimeisinMc[j+1]

f (c i iprime

)times γ(c j )

(2)

forallc sum

iiisinMc[1]

f (c i) = Vc (3)

foralli sum

ciisinMc[j ]j 6=1

sum

iprime

iprime

isinMc[jminus1]

f (c iprime

i)

+sum

ciisinMc[1]

f (c i) asympsum

ciisinciisinMc[j ]

Vc

|Mc [j ]|timesΠj

l=1γ(c l)

(4)

Figure 9 LP formulation for the network-aware flow dis-tribution problem Theasymp term in the last equation simplyrepresents that we have some leeway in allowing the load tobe within 10ndash20 of the mean

the tenant could derived these based on expected traffic pat-terns or history Stratos periodically recomputes these gainfactors based on the observed input-output ratios for eachchain

In addition to this flow conservation we also need to en-sure that each chainrsquos aggregate traffic will be processedthus we also model thiscoverageconstraint in Eq (3) Fi-nally we want to ensure that within each middlebox typethe load is roughly evenly distributed across the instancesofthat type in Eq (4) Here we use a general notion of loadbalancing where we can allow for some leeway say within10-20 of the targeted average load

We must ensure that the periodic flow redistributions andflow distribution accompanying scaling donrsquot enter into raceconditions We take two steps for this First any scalingattempt in a chain is preceded by a redistribution first Onlyif redistribution does not suffice does Stratos initial scalingtrials Second Stratos suspends all redistributions during thetime when scaling trials are being run across a given tenantrsquosdeployment

7 IMPLEMENTATIONWe have implemented a full featured Stratos prototype ca-

pable of running on commodity x86-64 hardware Figure 10shows an overview of the components involved

Stratos Data PlaneThe Stratos data plane is a configurableoverlay network realized through packet encapsulation andprogrammable software switches Each tenant VM has a

8

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 3: Stratos: A Network-Aware Orchestration Layer for

Figure 2 Lack of scaling due to network bottlenecks

Figure 3 Ineffective scaling due to poor placement

importantly the implemented logic may be fundamentallyinsufficient due to lack of intrinsic support from the cloudprovider We highlight this next

Elastic Scaling Being on the critical forwarding pathMBsrsquo performance and network footprint can significantlyimpact end-to-end application performance Unfortunatelythere are no effective schemes today to identify bottlenecksin and elastically scale MB chains This is because existingapproaches [11 4] do not recognize the chain as an entityand there are no intrinsic mechanisms to help control MBchain performance

Given todayrsquos compute-centric view tenants could moni-tor basic resource consumption (CPU memory IO) to iden-tify if individual MBs are bottlenecks Unfortunately thismay not be sufficient because the bottleneck may be a net-work link on the path between two MBs in a chain Whilenetwork bottlenecks also impact regular cloud applicationsthese effects get magnified in the context of MBs becausethey lie on the critical forwarding path We illustrate thisinFigure 2 where the IPS and RE MBs run at 50 utilizationand hence no scaling is triggered Yet the applicationrsquos per-formance which is bottlenecked by the congested link canbe improved by adding an RE instance (outside rack-2) andsending some part of the traffic to it In general unless theperformance constraints imposed by all elements in a chain- MBs and network links alike - are taken in account bottle-necks cannot be identifiedovercome effectively

One of the key reasons that effective elastic MB chainscaling is hard today is that cloud providers have no mecha-nisms to actively manage the network resources available tothe chains For example it has been shown that EC2rsquos VMplacement algorithm is essentially random given the instancesize [32] As such it is quite possible that a new replicais launched behind a congested network link in which casethe bottleneck would not be overcome effectively We illus-trate this in Figure 3 where the IPS instance runs at 80triggering scaling but the added replica does not improveend-to-end performance because of network congestion atthe replicarsquos location

Furthermore it is important to allocate the amount of traf-fic going to different replicas in a manner that takes prevalentnetwork congestion into account and equally importantlyre-allocate as network conditions (as well as MB load) change

Figure 4 Ineffective flow allocation due to lack of visibility

otherwise the scaling decision may not have the desired ef-fect This is impossible to do in any effective manner todayas network utilization information is unavailable to tenantsWe show this in Figure 4 where N2 flows are sent overthe congested inter-rack link An optimal network-aware so-lution in this case would be to only send N6 flows on thecongested link

3 Stratos OVERVIEWOur vision is to enable the same degree of flexibility and

elasticity that we have with other aspects of cloud computationmdashvirtual computing virtual storagemdashto in-the-cloud MBs Inthis section we start with an overview of our system Stratosto address this challenge

At a high-level Stratos can be viewed as a network-awareorchestration layer layer that enables cloud tenants to eas-ily manage MB deployments in the cloud without any of thecomplexity discussed earlier We envision all of the needednetwork-aware functionality to enable flexibility and elastic-ity is implemented by the cloud provider

31 Stratos tenant interfaceInstead of composing middlebox and application server

topologies through a smattering of third-party tools and con-figurations tenants definelogical topologiesusing high-levelabstractions (Figure 5) These topologies are automaticallytransformed into a set of forwarding rules defining how ap-plication traffic flows between server and MB instances Indoing so Stratos abstracts away the physical realization ofhow many and where these MB functions are realized

Here we use the notion of achainas the basic abstractionfor describing the direction specific traffic flows should takeA chain begins with a source of traffic (eg Internet clients)contains a sequence of one or more middleboxes the trafficshould traverse (eg IDS and load balancer) and ends witha destination (eg a set of web servers) Each edge in achain is annotated with an expectedtraffic gaindropfactorthat specifies the ratio of input-to-output packets (bytes)oneach specific middlebox in the chain For instance a firewallmay drop packets and a RE module may compress pack-ets on the fly The traffic gain factors capture these effectssince they impact the amount of traffic that traverses linksbetween MBs A tenantrsquos topology could contain multiplechains with overlapping middleboxes

32 Stratos internals

3

Tenant logical topology with a single chain

Tenant logical topology with two overlapping chains

Figure 5Example tenant logical topologies

In mapping this logical view to an actual physical realiza-tion Stratos needs to address three key challenges with eachaddressed by a corresponding Stratos component as shown

bull Elastic ScalingHow many physical MB instances ofeach type need to be deployedThis module takes in as input the logical topology givenby the cloud tenant the tenantrsquos current physical con-figuration and any service-level requirement that thetenant desires (eg upper bounds on cost or lower boundson application latency) It uses periodic measurementsof the end-to-end application performance to decidethe optimal number of instances of different middle-boxes necessary to meet the given service requirement

bull PlacementWhere should these MBs be placed insidethe cloud providerrsquos networkThe placement module takes in as input the currentstate of the cloud providerrsquos physical network topology(eg available racks available slots available band-width between racks) the logical topology of the clientthe current physical instantiation of this topology acrossthe provider network and the number of new MBs ofdifferent types that need to be initiated Given theseinputs it decides where to place the new MBs to avoidnetwork bottlenecks As a special case it also imple-ments an initial placement interface which starts withzero MBs

bull Flow Distribution How should the traffic be routedthrough the different MBsThe distribution module takes as input a given physi-cal instantiation of a tenant chain (ie the number andplacement of the MBs) measured (or statically spec-ified) traffic gaindrop factors for the MBs and thecurrent network topology with link utilizations to op-timally distribute the processing load between the dif-ferent MBs The goal here is to reduce network con-gestion effects for the traffic flowing between MB in-stances as well as balance the CPUmemory utiliza-tion of MB instances

In designing the individual modules and in integrating themStratos takes into account both computational loads and network-level effects This helps ensure that the scaling step honesinon the true bottlenecks and that good placement and load

Figure 6Overview of the high-level functionality and inter-faces to the client and the cloud provider in Stratos to enableflexible middlebox deployment in the cloud

balancing are implemented for the current workload Thisalso ensures that there is sufficient capacity to efficientlyaddnew MBs in the future

More precisely when the scaling module decides to in-crease the number of MBs it invokes the network-awareplacement module to decide where the new MBs need to beplaced The placement module in turn calls the flow distri-bution module to decide the optimal distribution strategy forthe chosen placement that takes into account network-leveleffects As MB network footprints change the flow distri-bution module can redistribute load to further improve thechainrsquos end-to-end performance

33 Interacting with other Provider FunctionsIn order to achieve the network-aware orchestration we

need new management APIs to facilitate interaction betweenStratos and existing cloud functions Specifically Stratos in-teracts with the cloud providerrsquos monitoring and VM deploy-ment components as shown by the dotted arrows in Figure 6The interaction occurs at two different timescales (down-ward arrows) First on a coarse-grained timescale Stratosrsquosplacement logic may be invoked (left down arrow) whenevernetwork-wide management actions occur (eg VM migra-tion) Second the monitoring layer periodically reports linkutilizations to Stratosrsquos flow distribution module (right downarrow) If there is significant change in background (non-Stratos) network traffic the flow distribution module caninvoke redistributions across tenant chains Last Stratosrsquosplacement logic specifies constraints on the location of newMBs at the end of scaling or that of MBs and applicationVMs at chain initialization time to the cloud providerrsquos VMdeployment module (upward dotted arrow)

The focus of this paper is on the internal logic of Stratosie addressing the challenges highlighted in Section 32 Inthe next three sections we discuss the algorithmic frame-works underlying the above Stratos modules We do so in atop-down fashion starting with the application-aware scal-ing (sect4) followed by the rack-aware placement (sect5) andthe network-aware traffic distribution mechanism (sect6)

4 ELASTIC SCALINGThe ability to scale capacity as needed is a major benefit

of deploying applications in the cloud This means that the

4

chain traversed by application traffic must also be scaled toavoid becoming a performance bottleneck

To illustrate the difficulty in scaling tenant chains we startby considering several strawman approaches and discuss whythese solutions are ineffective Building on the insight that atenantrsquos ultimate concern is the end-to-end application per-formance we design a practical scaling heuristic for elasti-cally scaling a tenantrsquos chain

41 Strawman approachesWe considered several strawman approaches for deciding

which MBs to scale but they turned out to be ineffective

1 Scale all MB types1 The simplest solution for a bot-tlenecked chain is to add extra instances for each MBtype in the chain This guarantees the bottleneck willbe eliminated but it potentially wastes significant re-sources and imposes unneeded costs (especially whenonly one MB is bottlenecked)2

2 Per-packet processing timeThe average per-packetprocessing time at each MB provides a common middlebox-agnostic metric If a chain is bottlenecked the MBwith the greatest increase in per-packet processing timeis likely the culprit However not all MBs follow a onepacket in one packet out convention eg a WAN op-timizer and it is unclear if we can calculate a usefulper-packet processing time in this case

3 Offered load Alternatively we could leverage CPUand memory utilization or other load metrics (eg con-nectionssecond) However different types of MBshave different resource or functional bottlenecks [21]and these bottlenecks may vary with the workload it-self (eg a high redundancy workload may stress a REmodule more) Even if we set this aside this approachalong with 2 and 3 above is network-agnostic andcan lead to poor scaling decisions as we argued in Sec-tion 2

Another candidate benchmarking MB throughput offlineis also unsuitable since it is based on a fixed traffic mix achange in the traffic mix may cause the MB to bottleneck ata rate lower or higher than the benchmarked throughput InSection 8 we use 3 as an example to show that naive ap-proaches either identify the wrong bottleneck or take scalingdecisions that result in using 2X more MBs than needed

Ultimately a tenant is concerned with (i) the performanceof their applications and (ii ) the cost of running their de-ployments Together these motivate the need to scale thedeployment updown depending on an application-reportedperformance metric to minimize aggregate cost while ensur-ing acceptable performance Many cloud applications al-ready track such metrics for elastic scaling (eg requestsper second served) and could easily export them to Stratos

1ldquoMB typerdquo refers to a specific type of middlebox2Unless other specified we us ldquoMBrdquo to refer to a single instanceofa specific type of middlebox

scale_up_single(MboxArray M )1 for j isin [0 |M |]3 Do2 improves larr False

4 add_instance(M [j ])5 wait(Duration )6 foreachapp isin Apps 7 if PerfImprovement (app) gt thresh 5 improves larr True

6 if improves = False8 remove_instance(M [j ])9 while improves = True

Fallback scale all in chain simultaneously

scale_multiple(BottleneckedChains)10 foreachC isin Chains 11 Overlap larr SharedBottlenecks larr 12 foreachC prime 6= C isin Chains 13 if overlap(C C prime) Add C prime to OverLap

14 if Bottlenecked(C prime ) Add C prime to SharedBottlenecks15 if |Overlap| =016 scale_up_single(C mbs)17 else if|SharedBottlenecks | = 018 scale_up_single(unique_mbs(C Overlap ))19 else20 scale_up_single(shared_mbs(C )Overlap )

Fallback scale each chain sequentially

Figure 7 High-level sketch of the scaling heuristic inStratos For clarity we only show the common case opera-tion and highlight one possible fall back solution Note thatin the multi-chain case non-overlapping scaling trials canbe run in parallel

42 Application-Aware Scaling HeuristicWe design a heuristic approach that leverages an application-

reported metric for scaling tenant chains Our intuitive goalhere is to ensure that the application SLAs are met even ifit means erring on the conservative side and launching a fewmore instances than what is needed optimally The scalingprocess is triggered by a significant change in the perfor-mance of any of the applications in a tenant deployment fora sustained period of time (our prototype checks to see ifthere is sustained unmet demand or the average end-to-endlatency increases by 15 percent over a 30s interval) Wefirst describe the scaling process for a single chain and thenextend it to multiple chains The latter can be extended in astraightforward manner to scaling across multiple tenants

Single Chain Our heuristic performs a set of scaling tri-als scaling each MB type in a tenant-specified chain oneinstance at a time as shown in lines 1ndash9 in Figure 7 We it-erate through the chain and keep an added instances as longas we observe an improvement in the applicationsrsquo perfor-mance (in our prototype we look for a 15 improvement inthrough and unmet load dropping) Note that multiple ap-plications could share the same chain thus we look for animprovement in at least one such application (As an opti-mization we only need to look for improvement in bottle-necked applications) If we see no improvement then werevert the added instance and move to the next MB type in

5

the chain The scaling procedure terminates when we reachthe end of the chain or we see no more improvement

Scale down occurs in a similar fashion except that we lookfor demand drops our prototype checks if there is no unmetdemand and the applicationrsquos throughput drops by a certainpercentage over a 1 minute interval Our current prototypeselects replicas in increasing order of volume served to tryscaling them down (ie removing them) To prevent scaleupdown oscillations we use a ldquodampingrdquo factor and waitfor some time (our prototype uses 25 seconds) before re-attempting scaling

We make a practical choice here to scale one MB type at atime We view this as a reasonable choice because the scal-ing decision for a MB type (and indeed each scaling trial) isaccompanied by careful placement of scaled instances (Sec-tion 5) and redistribution of load across all MBs in the chain(Section 6) The placement and distribution steps help ad-dress network bottlenecks at downstream MBs

Nevertheless it is possible our scaling approach does notimprove application performance eg when two MB typesare equally bottlenecked by compute resources In such caseswe use a conservative fall back to the simplescale allap-proach and add new instances for all MB types in the chain

Multi-chain Topologies When a tenant has multiple chainsin their deployment we could consider running scaling trialsin parallel for each chain However MB types can be sharedacross chains and thus a scaling trial will influence the out-come of other concurrent trials and result in unnecessary orinadequate scaling

Another option is to scale each chain sequentially We usethis as a starting point and speed it up by identifying the setof overlappingchains for each bottlenecked chain

Our approach to scaling in multi-chain topologies is shownin lines 10ndash20 in Figure 7 In the simplest case if a bottle-necked chain shares no MB types then we simply run thesingle chain scaling procedure as discussed earlier (lines15ndash16) If one or more MB types overlaps with another chainand the overlapping chains are also bottlenecked then weguess that the common MB instances are the bottlenecks andonly run the scaling trial for these shared MB types (lines19ndash20) On the other hand if we have overlapping chainswith no bottlenecks then we speculate that the MB typesunique to the current chain are bottlenecked and focus onthese instead (lines 17ndash18) The intuition here is that iden-tifying sharedisolated chains allows us to zoom in on thebottlenecks faster In the case where this heuristic fails toimprove performance (eg chains C1 and C2 share MB typeM that is a bottleneck for C1 but not C2) we err on the sideof caution and adopt a conservative approach and rerun thescaling procedure considering the union of MBs across allthe chains in the setOverlap3

Network-awarenessSince each scale updown trial relies3This fall back requires a minimal amount of state at the Stratoscontroller to track whether it has recently attempted a scaling trialfor a given chain

on the end-to-end application performance metrics our ap-proach isimplicitly network-aware It may be possible todesign explicit approaches that combine monitoring CPUmemory and IO resources with utilization of the networklinks used by a tenantrsquos chain However it appears difficultto precisely identify bottlenecks in such a setting and moreimportantly to determine the extent to which they should bescaled to meet application performance goals We leave suchexplicit approaches as a subject for future work Neverthe-less our evaluation of this implicit scheme shows a lowerbound on the benefits of network-awareness in scaling (sect8)

Since our approach does not rely on VM-level measure-ments it can be applied to tenant deployments with arbitraryMBs In particular tenants can compose cloud provider-offered MBs with those from third-party vendors creatingdiverse chains

5 RACK-AWARE PLACEMENTThe bandwidth available on network links impacts several

aspects of tenant deployments Greater available networkbandwidth on the path to and from an MB means better useof the MBrsquos processing functionality Greater network-wideavailable bandwidth also translates to more effective scalingdecisions Together these imply better application perfor-mance per unit cost (a function of MBs in the chain) fora tenant Optimal use of network capacity also allows thecloud provider to help elastically scale more tenant chains

As such Stratos incorporates a placement module thatmaximizes the bandwidth available to a chain while alsocontrolling the chainrsquos network-wide footprint even as thechain scales elastically In what follows we describe algo-rithms for two aspects of placement initially mapping theMBs in a tenantrsquos topology and placing new MB instances

51 Initial PlacementInitial MB placement is triggered whenever a new tenant

arrives or network-wide management actions occur (egVM migration)

There are two main inputs we use for initial placement(1) The tenant-specified logical chains between MB typesand application VMs along with the number of physical in-stances of MB type or application VM Edges are annotatedwith the gaindrop factorfor each MB instance which isratio of the net traffic entering the MB versus that leavingit We assume the tenant estimates these based on prior his-tory or expected traffic patterns For example with an ex-pected 50 redundancy in traffic an RE MB would have againdrop factor of 2 (compressed traffic entering the MB isdecompressed) These factors serve as weights to the edgesin a chain And (2) the available slots across different racksand available bandwidth of different links in the data centertopology The latter is based on historical estimates (egmean maximum or kth percentile) of link utilizations Weassume a uniform distribution of load across all MBs of thesame type

6

While this is a simplistic model it still forms a helpfulbasis for placement (especially vis-a-vis existing naive VMplacement schemes that consider individual VMs in isola-tion See sect10) Given this the placement algorithm has threelogical stages

Partitioning First we partition the topology (entire graphcorresponding to a tenant) with the goal of placing each par-tition in its entirety on a single rack so that we incur minimalinter-rack communication That is we partition the tenantrsquostopology intoK partitions such that for each partition thereis at least one rack with enough available VM slots to ac-commodate the partition We adapt the classical min-K-cutalgorithm [28] to identify the partitions starting withK = 1and increasingK until all partitions are small enough to beaccommodated

Assigning partitions to racks The next stage is to assignracks for each partition Here we use a greedy approachthat proceeds by sorting pairs of partitions in the decreasingorder of the inter-partition communication For each pairif both partitions are unassigned to racks we find a pair ofracks with the highest available bandwidth to accommodatethese two partitions If one of the partitions in the pair isalready assigned to a rack then we simply find a new rackfor the unassigned partition (If both are assigned we simplymove to the next pair)

Assigning VMs to slots Last we assign VMs (ie MBsand application VMs) within each partition to slots in theracks In case there is just one slot per (physical) machinewe randomly pick a slot and assign it to a VM If there aremore available slots we follow a similar procedure to par-tition the VMs so that VMs that communicate more amongeach other can be assigned closer to each other

52 Placing New Middlebox InstancesNew MBs launched after scaling a chain need to be placed

efficiently for scaling to be effective Ideally the new MBplacement should also help support future scale up for boththe tenant in question as for other tenants Our heuristic isdriven by these goals

To more accurately account for the network interaction ofthe scaled MBs we dynamically track the gaindrop factorsfor MBs in the tenantrsquos topology based on prevalent traf-fic patterns at each MB (using EWMA) Placement of thescaled MB considers the estimated ratios for the flows fromMBrsquos input and output VMs (those supplying traffic to andreceiving from the MB respectively) as input Placementthen works as follows

If the new instance can be accommodated in the same rackas its input MBs (or VMs) and output MBs (or VMs) then weplace the new instance in the same rack However if the newinstance cannot be accommodated in the same rack we se-lect a candidate rack (rack with free slots) that has the max-imum available bandwidth to the rack for input and outputMBs When the input and output MBs are in different rackswe consider each candidate rack and estimate the inter-rack

MB traffic using network-aware flow distribution (discussedin the next section) assuming that the new MB is placed inthe candidate rack We select the rack that minimizes theweighted sum of inter-rack flows (or maximizes the band-width available to inter-rack flows)

6 NETWORK-AWAREFLOW DISTRIBUTION

Akin to placement Stratosrsquos flow distribution module ac-tively manages how MBs use network capacity In contrastwith placement however flow distribution can be invoked atfine time-scales

Flow distribution is triggered whenever a scale updowndecision is made In particular the new instance placementheuristic in Section 52 invokes flow distribution when con-sidering the optimal location for the scaled instance Flowre-distribution can be triggered whenever the gain factor ofa MB instance changes significantly (eg from 2 to 1 forthe RE MB in sect5) Stratos periodically monitors each chainfor such changes Finally based on periodic input about net-work utilization from the cloud providerrsquos monitoring func-tionality flow re-distribution can be triggered across multi-ple tenant chains in response to changes in background (non-Stratos) network traffic This helps maximize the bandwidthavailable for intra-chain communications and improves ten-antsrsquo application performance The latter two re-distributionattempts happen at the same periodicity in our prototype

In essence flow distribution helps provide fine-grainedoptimization of chain performance as well as control overchain network footprint for a given physical deployment ofthe chain The key here is that we need to adjust traffic acrossthe entire set of chains of a tenant as focusing just on thescaled instance may result in less-than-ideal improvementsin tenant applicationsrsquo performance

Figure 8 Example tenant topology to explain the terms inthe LP framework for network-aware distribution For clar-ity we do not show the gain factors on the edges

Next we describe a systematic linear-programming (LP)based framework that formally captures the problem of network-aware flow distribution As such the logic we describe hereis generaland applies to multiple scenarios in which suchflow distribution is invoked for instance the common caseis when the distribution module is triggered as a result ofelastic scaling The module may also be triggered due tochanges in the background traffic as well has changes in the

7

gain factors for different MBs in a chain as a result of work-load changes for a given tenant Furthermore this logic eas-ily extends to the multi-tenant scenario with multiple chainsper tenant we simply consider the union of all chains acrossall tenants

Notation Let c denote a specific chain andVc be the to-tal volume (flows) of traffic that require processing via thischain There may be different types of MBs (ie IDS RE)within a chain|c| is the number of MBs in a given chainc Let c[j ] be the type of the middlebox that is at positionj in the chainc (eg IDS RE) Letk denote the type of amiddlebox andMk be the set of MB instances of typek thatthe scaling module has launched ThusMc[j ] is the set ofMB instances of typec[j ] we usei isin Mc[j ] to specify thata MB instancei belongs to this type Figure 8 gives a quickoverview of the different entities involved in this formula-tion

LP Formulation Our goal is to split the traffic across theinstances of each type such that (a) the processing respon-sibilities are distributed roughly equally across them and(b)the aggregatenetwork footprintis minimized Thus we needto determine how the traffic is routed between different MBsLet f (c i i

prime

) denote the volume of traffic in chainc beingrouted from middleboxi to the instancei

prime

(see Figure 8)As a special casef (c i) denotes traffic routed to the firstmiddlebox in a chain from a source element4

Suppose each unit traffic of flowing between a pair of in-stances incurs some network-level costCost(i rarr i prime) de-notes the network-level cost between two instances In thesimplest case this is a binary variablemdash1 if the two MBsare in different racks and 0 otherwise (We can use more ad-vanced measures to capture latency or available bandwidthas well)

Given this setup Figure 9 formalizes the network-awareflow distribution problem that Stratos solves Here Eq (1)captures the network-wide footprint of routing traffic be-tween potential instances of thej th MB in a chain to thej + 1th MB in that chain For completeness we consider allpossible combinations of routing traffic from one instance toanother In practice the optimization will prefer only com-binations that have low footprints

Eq (2) models aflow conservationprinciple For eachchain and for each position in the chain the volume of trafficentering the middlebox has to be equal to the volume exitingit to the next middlebox type in the sequence Since middle-boxes may change the aggregate volume (eg a firewall maydrop traffic or RE may compress traffic) we consider a gen-eralized notion of conservation that also takes into accountthe expected gaindrop factorγ(c j ) which is the ratio ofincoming-to-outgoing traffic at the positionj for the chaincFor initial placement we expect the tenant to provide thesefactors as annotations to the logical topology specification4For clarity we focus only on the forward direction of the chainnoting that our implementation uses an extended formulation thatcaptures bidirectional chains as well

Minimize

sum

c

|c|minus1sum

j=1

sum

iiprime

st

iisinMc[j ]iprime

isinMc[j+1]

Cost(i iprime

)times f (c i iprime

) (1)

subject to

foralli forallc s t i isin Mc[j ] amp j gt 1 sum

iprime iprimeisinMc[jminus1]

f (c iprime

i) =sum

iprime iprimeisinMc[j+1]

f (c i iprime

)times γ(c j )

(2)

forallc sum

iiisinMc[1]

f (c i) = Vc (3)

foralli sum

ciisinMc[j ]j 6=1

sum

iprime

iprime

isinMc[jminus1]

f (c iprime

i)

+sum

ciisinMc[1]

f (c i) asympsum

ciisinciisinMc[j ]

Vc

|Mc [j ]|timesΠj

l=1γ(c l)

(4)

Figure 9 LP formulation for the network-aware flow dis-tribution problem Theasymp term in the last equation simplyrepresents that we have some leeway in allowing the load tobe within 10ndash20 of the mean

the tenant could derived these based on expected traffic pat-terns or history Stratos periodically recomputes these gainfactors based on the observed input-output ratios for eachchain

In addition to this flow conservation we also need to en-sure that each chainrsquos aggregate traffic will be processedthus we also model thiscoverageconstraint in Eq (3) Fi-nally we want to ensure that within each middlebox typethe load is roughly evenly distributed across the instancesofthat type in Eq (4) Here we use a general notion of loadbalancing where we can allow for some leeway say within10-20 of the targeted average load

We must ensure that the periodic flow redistributions andflow distribution accompanying scaling donrsquot enter into raceconditions We take two steps for this First any scalingattempt in a chain is preceded by a redistribution first Onlyif redistribution does not suffice does Stratos initial scalingtrials Second Stratos suspends all redistributions during thetime when scaling trials are being run across a given tenantrsquosdeployment

7 IMPLEMENTATIONWe have implemented a full featured Stratos prototype ca-

pable of running on commodity x86-64 hardware Figure 10shows an overview of the components involved

Stratos Data PlaneThe Stratos data plane is a configurableoverlay network realized through packet encapsulation andprogrammable software switches Each tenant VM has a

8

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 4: Stratos: A Network-Aware Orchestration Layer for

Tenant logical topology with a single chain

Tenant logical topology with two overlapping chains

Figure 5Example tenant logical topologies

In mapping this logical view to an actual physical realiza-tion Stratos needs to address three key challenges with eachaddressed by a corresponding Stratos component as shown

bull Elastic ScalingHow many physical MB instances ofeach type need to be deployedThis module takes in as input the logical topology givenby the cloud tenant the tenantrsquos current physical con-figuration and any service-level requirement that thetenant desires (eg upper bounds on cost or lower boundson application latency) It uses periodic measurementsof the end-to-end application performance to decidethe optimal number of instances of different middle-boxes necessary to meet the given service requirement

bull PlacementWhere should these MBs be placed insidethe cloud providerrsquos networkThe placement module takes in as input the currentstate of the cloud providerrsquos physical network topology(eg available racks available slots available band-width between racks) the logical topology of the clientthe current physical instantiation of this topology acrossthe provider network and the number of new MBs ofdifferent types that need to be initiated Given theseinputs it decides where to place the new MBs to avoidnetwork bottlenecks As a special case it also imple-ments an initial placement interface which starts withzero MBs

bull Flow Distribution How should the traffic be routedthrough the different MBsThe distribution module takes as input a given physi-cal instantiation of a tenant chain (ie the number andplacement of the MBs) measured (or statically spec-ified) traffic gaindrop factors for the MBs and thecurrent network topology with link utilizations to op-timally distribute the processing load between the dif-ferent MBs The goal here is to reduce network con-gestion effects for the traffic flowing between MB in-stances as well as balance the CPUmemory utiliza-tion of MB instances

In designing the individual modules and in integrating themStratos takes into account both computational loads and network-level effects This helps ensure that the scaling step honesinon the true bottlenecks and that good placement and load

Figure 6Overview of the high-level functionality and inter-faces to the client and the cloud provider in Stratos to enableflexible middlebox deployment in the cloud

balancing are implemented for the current workload Thisalso ensures that there is sufficient capacity to efficientlyaddnew MBs in the future

More precisely when the scaling module decides to in-crease the number of MBs it invokes the network-awareplacement module to decide where the new MBs need to beplaced The placement module in turn calls the flow distri-bution module to decide the optimal distribution strategy forthe chosen placement that takes into account network-leveleffects As MB network footprints change the flow distri-bution module can redistribute load to further improve thechainrsquos end-to-end performance

33 Interacting with other Provider FunctionsIn order to achieve the network-aware orchestration we

need new management APIs to facilitate interaction betweenStratos and existing cloud functions Specifically Stratos in-teracts with the cloud providerrsquos monitoring and VM deploy-ment components as shown by the dotted arrows in Figure 6The interaction occurs at two different timescales (down-ward arrows) First on a coarse-grained timescale Stratosrsquosplacement logic may be invoked (left down arrow) whenevernetwork-wide management actions occur (eg VM migra-tion) Second the monitoring layer periodically reports linkutilizations to Stratosrsquos flow distribution module (right downarrow) If there is significant change in background (non-Stratos) network traffic the flow distribution module caninvoke redistributions across tenant chains Last Stratosrsquosplacement logic specifies constraints on the location of newMBs at the end of scaling or that of MBs and applicationVMs at chain initialization time to the cloud providerrsquos VMdeployment module (upward dotted arrow)

The focus of this paper is on the internal logic of Stratosie addressing the challenges highlighted in Section 32 Inthe next three sections we discuss the algorithmic frame-works underlying the above Stratos modules We do so in atop-down fashion starting with the application-aware scal-ing (sect4) followed by the rack-aware placement (sect5) andthe network-aware traffic distribution mechanism (sect6)

4 ELASTIC SCALINGThe ability to scale capacity as needed is a major benefit

of deploying applications in the cloud This means that the

4

chain traversed by application traffic must also be scaled toavoid becoming a performance bottleneck

To illustrate the difficulty in scaling tenant chains we startby considering several strawman approaches and discuss whythese solutions are ineffective Building on the insight that atenantrsquos ultimate concern is the end-to-end application per-formance we design a practical scaling heuristic for elasti-cally scaling a tenantrsquos chain

41 Strawman approachesWe considered several strawman approaches for deciding

which MBs to scale but they turned out to be ineffective

1 Scale all MB types1 The simplest solution for a bot-tlenecked chain is to add extra instances for each MBtype in the chain This guarantees the bottleneck willbe eliminated but it potentially wastes significant re-sources and imposes unneeded costs (especially whenonly one MB is bottlenecked)2

2 Per-packet processing timeThe average per-packetprocessing time at each MB provides a common middlebox-agnostic metric If a chain is bottlenecked the MBwith the greatest increase in per-packet processing timeis likely the culprit However not all MBs follow a onepacket in one packet out convention eg a WAN op-timizer and it is unclear if we can calculate a usefulper-packet processing time in this case

3 Offered load Alternatively we could leverage CPUand memory utilization or other load metrics (eg con-nectionssecond) However different types of MBshave different resource or functional bottlenecks [21]and these bottlenecks may vary with the workload it-self (eg a high redundancy workload may stress a REmodule more) Even if we set this aside this approachalong with 2 and 3 above is network-agnostic andcan lead to poor scaling decisions as we argued in Sec-tion 2

Another candidate benchmarking MB throughput offlineis also unsuitable since it is based on a fixed traffic mix achange in the traffic mix may cause the MB to bottleneck ata rate lower or higher than the benchmarked throughput InSection 8 we use 3 as an example to show that naive ap-proaches either identify the wrong bottleneck or take scalingdecisions that result in using 2X more MBs than needed

Ultimately a tenant is concerned with (i) the performanceof their applications and (ii ) the cost of running their de-ployments Together these motivate the need to scale thedeployment updown depending on an application-reportedperformance metric to minimize aggregate cost while ensur-ing acceptable performance Many cloud applications al-ready track such metrics for elastic scaling (eg requestsper second served) and could easily export them to Stratos

1ldquoMB typerdquo refers to a specific type of middlebox2Unless other specified we us ldquoMBrdquo to refer to a single instanceofa specific type of middlebox

scale_up_single(MboxArray M )1 for j isin [0 |M |]3 Do2 improves larr False

4 add_instance(M [j ])5 wait(Duration )6 foreachapp isin Apps 7 if PerfImprovement (app) gt thresh 5 improves larr True

6 if improves = False8 remove_instance(M [j ])9 while improves = True

Fallback scale all in chain simultaneously

scale_multiple(BottleneckedChains)10 foreachC isin Chains 11 Overlap larr SharedBottlenecks larr 12 foreachC prime 6= C isin Chains 13 if overlap(C C prime) Add C prime to OverLap

14 if Bottlenecked(C prime ) Add C prime to SharedBottlenecks15 if |Overlap| =016 scale_up_single(C mbs)17 else if|SharedBottlenecks | = 018 scale_up_single(unique_mbs(C Overlap ))19 else20 scale_up_single(shared_mbs(C )Overlap )

Fallback scale each chain sequentially

Figure 7 High-level sketch of the scaling heuristic inStratos For clarity we only show the common case opera-tion and highlight one possible fall back solution Note thatin the multi-chain case non-overlapping scaling trials canbe run in parallel

42 Application-Aware Scaling HeuristicWe design a heuristic approach that leverages an application-

reported metric for scaling tenant chains Our intuitive goalhere is to ensure that the application SLAs are met even ifit means erring on the conservative side and launching a fewmore instances than what is needed optimally The scalingprocess is triggered by a significant change in the perfor-mance of any of the applications in a tenant deployment fora sustained period of time (our prototype checks to see ifthere is sustained unmet demand or the average end-to-endlatency increases by 15 percent over a 30s interval) Wefirst describe the scaling process for a single chain and thenextend it to multiple chains The latter can be extended in astraightforward manner to scaling across multiple tenants

Single Chain Our heuristic performs a set of scaling tri-als scaling each MB type in a tenant-specified chain oneinstance at a time as shown in lines 1ndash9 in Figure 7 We it-erate through the chain and keep an added instances as longas we observe an improvement in the applicationsrsquo perfor-mance (in our prototype we look for a 15 improvement inthrough and unmet load dropping) Note that multiple ap-plications could share the same chain thus we look for animprovement in at least one such application (As an opti-mization we only need to look for improvement in bottle-necked applications) If we see no improvement then werevert the added instance and move to the next MB type in

5

the chain The scaling procedure terminates when we reachthe end of the chain or we see no more improvement

Scale down occurs in a similar fashion except that we lookfor demand drops our prototype checks if there is no unmetdemand and the applicationrsquos throughput drops by a certainpercentage over a 1 minute interval Our current prototypeselects replicas in increasing order of volume served to tryscaling them down (ie removing them) To prevent scaleupdown oscillations we use a ldquodampingrdquo factor and waitfor some time (our prototype uses 25 seconds) before re-attempting scaling

We make a practical choice here to scale one MB type at atime We view this as a reasonable choice because the scal-ing decision for a MB type (and indeed each scaling trial) isaccompanied by careful placement of scaled instances (Sec-tion 5) and redistribution of load across all MBs in the chain(Section 6) The placement and distribution steps help ad-dress network bottlenecks at downstream MBs

Nevertheless it is possible our scaling approach does notimprove application performance eg when two MB typesare equally bottlenecked by compute resources In such caseswe use a conservative fall back to the simplescale allap-proach and add new instances for all MB types in the chain

Multi-chain Topologies When a tenant has multiple chainsin their deployment we could consider running scaling trialsin parallel for each chain However MB types can be sharedacross chains and thus a scaling trial will influence the out-come of other concurrent trials and result in unnecessary orinadequate scaling

Another option is to scale each chain sequentially We usethis as a starting point and speed it up by identifying the setof overlappingchains for each bottlenecked chain

Our approach to scaling in multi-chain topologies is shownin lines 10ndash20 in Figure 7 In the simplest case if a bottle-necked chain shares no MB types then we simply run thesingle chain scaling procedure as discussed earlier (lines15ndash16) If one or more MB types overlaps with another chainand the overlapping chains are also bottlenecked then weguess that the common MB instances are the bottlenecks andonly run the scaling trial for these shared MB types (lines19ndash20) On the other hand if we have overlapping chainswith no bottlenecks then we speculate that the MB typesunique to the current chain are bottlenecked and focus onthese instead (lines 17ndash18) The intuition here is that iden-tifying sharedisolated chains allows us to zoom in on thebottlenecks faster In the case where this heuristic fails toimprove performance (eg chains C1 and C2 share MB typeM that is a bottleneck for C1 but not C2) we err on the sideof caution and adopt a conservative approach and rerun thescaling procedure considering the union of MBs across allthe chains in the setOverlap3

Network-awarenessSince each scale updown trial relies3This fall back requires a minimal amount of state at the Stratoscontroller to track whether it has recently attempted a scaling trialfor a given chain

on the end-to-end application performance metrics our ap-proach isimplicitly network-aware It may be possible todesign explicit approaches that combine monitoring CPUmemory and IO resources with utilization of the networklinks used by a tenantrsquos chain However it appears difficultto precisely identify bottlenecks in such a setting and moreimportantly to determine the extent to which they should bescaled to meet application performance goals We leave suchexplicit approaches as a subject for future work Neverthe-less our evaluation of this implicit scheme shows a lowerbound on the benefits of network-awareness in scaling (sect8)

Since our approach does not rely on VM-level measure-ments it can be applied to tenant deployments with arbitraryMBs In particular tenants can compose cloud provider-offered MBs with those from third-party vendors creatingdiverse chains

5 RACK-AWARE PLACEMENTThe bandwidth available on network links impacts several

aspects of tenant deployments Greater available networkbandwidth on the path to and from an MB means better useof the MBrsquos processing functionality Greater network-wideavailable bandwidth also translates to more effective scalingdecisions Together these imply better application perfor-mance per unit cost (a function of MBs in the chain) fora tenant Optimal use of network capacity also allows thecloud provider to help elastically scale more tenant chains

As such Stratos incorporates a placement module thatmaximizes the bandwidth available to a chain while alsocontrolling the chainrsquos network-wide footprint even as thechain scales elastically In what follows we describe algo-rithms for two aspects of placement initially mapping theMBs in a tenantrsquos topology and placing new MB instances

51 Initial PlacementInitial MB placement is triggered whenever a new tenant

arrives or network-wide management actions occur (egVM migration)

There are two main inputs we use for initial placement(1) The tenant-specified logical chains between MB typesand application VMs along with the number of physical in-stances of MB type or application VM Edges are annotatedwith the gaindrop factorfor each MB instance which isratio of the net traffic entering the MB versus that leavingit We assume the tenant estimates these based on prior his-tory or expected traffic patterns For example with an ex-pected 50 redundancy in traffic an RE MB would have againdrop factor of 2 (compressed traffic entering the MB isdecompressed) These factors serve as weights to the edgesin a chain And (2) the available slots across different racksand available bandwidth of different links in the data centertopology The latter is based on historical estimates (egmean maximum or kth percentile) of link utilizations Weassume a uniform distribution of load across all MBs of thesame type

6

While this is a simplistic model it still forms a helpfulbasis for placement (especially vis-a-vis existing naive VMplacement schemes that consider individual VMs in isola-tion See sect10) Given this the placement algorithm has threelogical stages

Partitioning First we partition the topology (entire graphcorresponding to a tenant) with the goal of placing each par-tition in its entirety on a single rack so that we incur minimalinter-rack communication That is we partition the tenantrsquostopology intoK partitions such that for each partition thereis at least one rack with enough available VM slots to ac-commodate the partition We adapt the classical min-K-cutalgorithm [28] to identify the partitions starting withK = 1and increasingK until all partitions are small enough to beaccommodated

Assigning partitions to racks The next stage is to assignracks for each partition Here we use a greedy approachthat proceeds by sorting pairs of partitions in the decreasingorder of the inter-partition communication For each pairif both partitions are unassigned to racks we find a pair ofracks with the highest available bandwidth to accommodatethese two partitions If one of the partitions in the pair isalready assigned to a rack then we simply find a new rackfor the unassigned partition (If both are assigned we simplymove to the next pair)

Assigning VMs to slots Last we assign VMs (ie MBsand application VMs) within each partition to slots in theracks In case there is just one slot per (physical) machinewe randomly pick a slot and assign it to a VM If there aremore available slots we follow a similar procedure to par-tition the VMs so that VMs that communicate more amongeach other can be assigned closer to each other

52 Placing New Middlebox InstancesNew MBs launched after scaling a chain need to be placed

efficiently for scaling to be effective Ideally the new MBplacement should also help support future scale up for boththe tenant in question as for other tenants Our heuristic isdriven by these goals

To more accurately account for the network interaction ofthe scaled MBs we dynamically track the gaindrop factorsfor MBs in the tenantrsquos topology based on prevalent traf-fic patterns at each MB (using EWMA) Placement of thescaled MB considers the estimated ratios for the flows fromMBrsquos input and output VMs (those supplying traffic to andreceiving from the MB respectively) as input Placementthen works as follows

If the new instance can be accommodated in the same rackas its input MBs (or VMs) and output MBs (or VMs) then weplace the new instance in the same rack However if the newinstance cannot be accommodated in the same rack we se-lect a candidate rack (rack with free slots) that has the max-imum available bandwidth to the rack for input and outputMBs When the input and output MBs are in different rackswe consider each candidate rack and estimate the inter-rack

MB traffic using network-aware flow distribution (discussedin the next section) assuming that the new MB is placed inthe candidate rack We select the rack that minimizes theweighted sum of inter-rack flows (or maximizes the band-width available to inter-rack flows)

6 NETWORK-AWAREFLOW DISTRIBUTION

Akin to placement Stratosrsquos flow distribution module ac-tively manages how MBs use network capacity In contrastwith placement however flow distribution can be invoked atfine time-scales

Flow distribution is triggered whenever a scale updowndecision is made In particular the new instance placementheuristic in Section 52 invokes flow distribution when con-sidering the optimal location for the scaled instance Flowre-distribution can be triggered whenever the gain factor ofa MB instance changes significantly (eg from 2 to 1 forthe RE MB in sect5) Stratos periodically monitors each chainfor such changes Finally based on periodic input about net-work utilization from the cloud providerrsquos monitoring func-tionality flow re-distribution can be triggered across multi-ple tenant chains in response to changes in background (non-Stratos) network traffic This helps maximize the bandwidthavailable for intra-chain communications and improves ten-antsrsquo application performance The latter two re-distributionattempts happen at the same periodicity in our prototype

In essence flow distribution helps provide fine-grainedoptimization of chain performance as well as control overchain network footprint for a given physical deployment ofthe chain The key here is that we need to adjust traffic acrossthe entire set of chains of a tenant as focusing just on thescaled instance may result in less-than-ideal improvementsin tenant applicationsrsquo performance

Figure 8 Example tenant topology to explain the terms inthe LP framework for network-aware distribution For clar-ity we do not show the gain factors on the edges

Next we describe a systematic linear-programming (LP)based framework that formally captures the problem of network-aware flow distribution As such the logic we describe hereis generaland applies to multiple scenarios in which suchflow distribution is invoked for instance the common caseis when the distribution module is triggered as a result ofelastic scaling The module may also be triggered due tochanges in the background traffic as well has changes in the

7

gain factors for different MBs in a chain as a result of work-load changes for a given tenant Furthermore this logic eas-ily extends to the multi-tenant scenario with multiple chainsper tenant we simply consider the union of all chains acrossall tenants

Notation Let c denote a specific chain andVc be the to-tal volume (flows) of traffic that require processing via thischain There may be different types of MBs (ie IDS RE)within a chain|c| is the number of MBs in a given chainc Let c[j ] be the type of the middlebox that is at positionj in the chainc (eg IDS RE) Letk denote the type of amiddlebox andMk be the set of MB instances of typek thatthe scaling module has launched ThusMc[j ] is the set ofMB instances of typec[j ] we usei isin Mc[j ] to specify thata MB instancei belongs to this type Figure 8 gives a quickoverview of the different entities involved in this formula-tion

LP Formulation Our goal is to split the traffic across theinstances of each type such that (a) the processing respon-sibilities are distributed roughly equally across them and(b)the aggregatenetwork footprintis minimized Thus we needto determine how the traffic is routed between different MBsLet f (c i i

prime

) denote the volume of traffic in chainc beingrouted from middleboxi to the instancei

prime

(see Figure 8)As a special casef (c i) denotes traffic routed to the firstmiddlebox in a chain from a source element4

Suppose each unit traffic of flowing between a pair of in-stances incurs some network-level costCost(i rarr i prime) de-notes the network-level cost between two instances In thesimplest case this is a binary variablemdash1 if the two MBsare in different racks and 0 otherwise (We can use more ad-vanced measures to capture latency or available bandwidthas well)

Given this setup Figure 9 formalizes the network-awareflow distribution problem that Stratos solves Here Eq (1)captures the network-wide footprint of routing traffic be-tween potential instances of thej th MB in a chain to thej + 1th MB in that chain For completeness we consider allpossible combinations of routing traffic from one instance toanother In practice the optimization will prefer only com-binations that have low footprints

Eq (2) models aflow conservationprinciple For eachchain and for each position in the chain the volume of trafficentering the middlebox has to be equal to the volume exitingit to the next middlebox type in the sequence Since middle-boxes may change the aggregate volume (eg a firewall maydrop traffic or RE may compress traffic) we consider a gen-eralized notion of conservation that also takes into accountthe expected gaindrop factorγ(c j ) which is the ratio ofincoming-to-outgoing traffic at the positionj for the chaincFor initial placement we expect the tenant to provide thesefactors as annotations to the logical topology specification4For clarity we focus only on the forward direction of the chainnoting that our implementation uses an extended formulation thatcaptures bidirectional chains as well

Minimize

sum

c

|c|minus1sum

j=1

sum

iiprime

st

iisinMc[j ]iprime

isinMc[j+1]

Cost(i iprime

)times f (c i iprime

) (1)

subject to

foralli forallc s t i isin Mc[j ] amp j gt 1 sum

iprime iprimeisinMc[jminus1]

f (c iprime

i) =sum

iprime iprimeisinMc[j+1]

f (c i iprime

)times γ(c j )

(2)

forallc sum

iiisinMc[1]

f (c i) = Vc (3)

foralli sum

ciisinMc[j ]j 6=1

sum

iprime

iprime

isinMc[jminus1]

f (c iprime

i)

+sum

ciisinMc[1]

f (c i) asympsum

ciisinciisinMc[j ]

Vc

|Mc [j ]|timesΠj

l=1γ(c l)

(4)

Figure 9 LP formulation for the network-aware flow dis-tribution problem Theasymp term in the last equation simplyrepresents that we have some leeway in allowing the load tobe within 10ndash20 of the mean

the tenant could derived these based on expected traffic pat-terns or history Stratos periodically recomputes these gainfactors based on the observed input-output ratios for eachchain

In addition to this flow conservation we also need to en-sure that each chainrsquos aggregate traffic will be processedthus we also model thiscoverageconstraint in Eq (3) Fi-nally we want to ensure that within each middlebox typethe load is roughly evenly distributed across the instancesofthat type in Eq (4) Here we use a general notion of loadbalancing where we can allow for some leeway say within10-20 of the targeted average load

We must ensure that the periodic flow redistributions andflow distribution accompanying scaling donrsquot enter into raceconditions We take two steps for this First any scalingattempt in a chain is preceded by a redistribution first Onlyif redistribution does not suffice does Stratos initial scalingtrials Second Stratos suspends all redistributions during thetime when scaling trials are being run across a given tenantrsquosdeployment

7 IMPLEMENTATIONWe have implemented a full featured Stratos prototype ca-

pable of running on commodity x86-64 hardware Figure 10shows an overview of the components involved

Stratos Data PlaneThe Stratos data plane is a configurableoverlay network realized through packet encapsulation andprogrammable software switches Each tenant VM has a

8

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 5: Stratos: A Network-Aware Orchestration Layer for

chain traversed by application traffic must also be scaled toavoid becoming a performance bottleneck

To illustrate the difficulty in scaling tenant chains we startby considering several strawman approaches and discuss whythese solutions are ineffective Building on the insight that atenantrsquos ultimate concern is the end-to-end application per-formance we design a practical scaling heuristic for elasti-cally scaling a tenantrsquos chain

41 Strawman approachesWe considered several strawman approaches for deciding

which MBs to scale but they turned out to be ineffective

1 Scale all MB types1 The simplest solution for a bot-tlenecked chain is to add extra instances for each MBtype in the chain This guarantees the bottleneck willbe eliminated but it potentially wastes significant re-sources and imposes unneeded costs (especially whenonly one MB is bottlenecked)2

2 Per-packet processing timeThe average per-packetprocessing time at each MB provides a common middlebox-agnostic metric If a chain is bottlenecked the MBwith the greatest increase in per-packet processing timeis likely the culprit However not all MBs follow a onepacket in one packet out convention eg a WAN op-timizer and it is unclear if we can calculate a usefulper-packet processing time in this case

3 Offered load Alternatively we could leverage CPUand memory utilization or other load metrics (eg con-nectionssecond) However different types of MBshave different resource or functional bottlenecks [21]and these bottlenecks may vary with the workload it-self (eg a high redundancy workload may stress a REmodule more) Even if we set this aside this approachalong with 2 and 3 above is network-agnostic andcan lead to poor scaling decisions as we argued in Sec-tion 2

Another candidate benchmarking MB throughput offlineis also unsuitable since it is based on a fixed traffic mix achange in the traffic mix may cause the MB to bottleneck ata rate lower or higher than the benchmarked throughput InSection 8 we use 3 as an example to show that naive ap-proaches either identify the wrong bottleneck or take scalingdecisions that result in using 2X more MBs than needed

Ultimately a tenant is concerned with (i) the performanceof their applications and (ii ) the cost of running their de-ployments Together these motivate the need to scale thedeployment updown depending on an application-reportedperformance metric to minimize aggregate cost while ensur-ing acceptable performance Many cloud applications al-ready track such metrics for elastic scaling (eg requestsper second served) and could easily export them to Stratos

1ldquoMB typerdquo refers to a specific type of middlebox2Unless other specified we us ldquoMBrdquo to refer to a single instanceofa specific type of middlebox

scale_up_single(MboxArray M )1 for j isin [0 |M |]3 Do2 improves larr False

4 add_instance(M [j ])5 wait(Duration )6 foreachapp isin Apps 7 if PerfImprovement (app) gt thresh 5 improves larr True

6 if improves = False8 remove_instance(M [j ])9 while improves = True

Fallback scale all in chain simultaneously

scale_multiple(BottleneckedChains)10 foreachC isin Chains 11 Overlap larr SharedBottlenecks larr 12 foreachC prime 6= C isin Chains 13 if overlap(C C prime) Add C prime to OverLap

14 if Bottlenecked(C prime ) Add C prime to SharedBottlenecks15 if |Overlap| =016 scale_up_single(C mbs)17 else if|SharedBottlenecks | = 018 scale_up_single(unique_mbs(C Overlap ))19 else20 scale_up_single(shared_mbs(C )Overlap )

Fallback scale each chain sequentially

Figure 7 High-level sketch of the scaling heuristic inStratos For clarity we only show the common case opera-tion and highlight one possible fall back solution Note thatin the multi-chain case non-overlapping scaling trials canbe run in parallel

42 Application-Aware Scaling HeuristicWe design a heuristic approach that leverages an application-

reported metric for scaling tenant chains Our intuitive goalhere is to ensure that the application SLAs are met even ifit means erring on the conservative side and launching a fewmore instances than what is needed optimally The scalingprocess is triggered by a significant change in the perfor-mance of any of the applications in a tenant deployment fora sustained period of time (our prototype checks to see ifthere is sustained unmet demand or the average end-to-endlatency increases by 15 percent over a 30s interval) Wefirst describe the scaling process for a single chain and thenextend it to multiple chains The latter can be extended in astraightforward manner to scaling across multiple tenants

Single Chain Our heuristic performs a set of scaling tri-als scaling each MB type in a tenant-specified chain oneinstance at a time as shown in lines 1ndash9 in Figure 7 We it-erate through the chain and keep an added instances as longas we observe an improvement in the applicationsrsquo perfor-mance (in our prototype we look for a 15 improvement inthrough and unmet load dropping) Note that multiple ap-plications could share the same chain thus we look for animprovement in at least one such application (As an opti-mization we only need to look for improvement in bottle-necked applications) If we see no improvement then werevert the added instance and move to the next MB type in

5

the chain The scaling procedure terminates when we reachthe end of the chain or we see no more improvement

Scale down occurs in a similar fashion except that we lookfor demand drops our prototype checks if there is no unmetdemand and the applicationrsquos throughput drops by a certainpercentage over a 1 minute interval Our current prototypeselects replicas in increasing order of volume served to tryscaling them down (ie removing them) To prevent scaleupdown oscillations we use a ldquodampingrdquo factor and waitfor some time (our prototype uses 25 seconds) before re-attempting scaling

We make a practical choice here to scale one MB type at atime We view this as a reasonable choice because the scal-ing decision for a MB type (and indeed each scaling trial) isaccompanied by careful placement of scaled instances (Sec-tion 5) and redistribution of load across all MBs in the chain(Section 6) The placement and distribution steps help ad-dress network bottlenecks at downstream MBs

Nevertheless it is possible our scaling approach does notimprove application performance eg when two MB typesare equally bottlenecked by compute resources In such caseswe use a conservative fall back to the simplescale allap-proach and add new instances for all MB types in the chain

Multi-chain Topologies When a tenant has multiple chainsin their deployment we could consider running scaling trialsin parallel for each chain However MB types can be sharedacross chains and thus a scaling trial will influence the out-come of other concurrent trials and result in unnecessary orinadequate scaling

Another option is to scale each chain sequentially We usethis as a starting point and speed it up by identifying the setof overlappingchains for each bottlenecked chain

Our approach to scaling in multi-chain topologies is shownin lines 10ndash20 in Figure 7 In the simplest case if a bottle-necked chain shares no MB types then we simply run thesingle chain scaling procedure as discussed earlier (lines15ndash16) If one or more MB types overlaps with another chainand the overlapping chains are also bottlenecked then weguess that the common MB instances are the bottlenecks andonly run the scaling trial for these shared MB types (lines19ndash20) On the other hand if we have overlapping chainswith no bottlenecks then we speculate that the MB typesunique to the current chain are bottlenecked and focus onthese instead (lines 17ndash18) The intuition here is that iden-tifying sharedisolated chains allows us to zoom in on thebottlenecks faster In the case where this heuristic fails toimprove performance (eg chains C1 and C2 share MB typeM that is a bottleneck for C1 but not C2) we err on the sideof caution and adopt a conservative approach and rerun thescaling procedure considering the union of MBs across allthe chains in the setOverlap3

Network-awarenessSince each scale updown trial relies3This fall back requires a minimal amount of state at the Stratoscontroller to track whether it has recently attempted a scaling trialfor a given chain

on the end-to-end application performance metrics our ap-proach isimplicitly network-aware It may be possible todesign explicit approaches that combine monitoring CPUmemory and IO resources with utilization of the networklinks used by a tenantrsquos chain However it appears difficultto precisely identify bottlenecks in such a setting and moreimportantly to determine the extent to which they should bescaled to meet application performance goals We leave suchexplicit approaches as a subject for future work Neverthe-less our evaluation of this implicit scheme shows a lowerbound on the benefits of network-awareness in scaling (sect8)

Since our approach does not rely on VM-level measure-ments it can be applied to tenant deployments with arbitraryMBs In particular tenants can compose cloud provider-offered MBs with those from third-party vendors creatingdiverse chains

5 RACK-AWARE PLACEMENTThe bandwidth available on network links impacts several

aspects of tenant deployments Greater available networkbandwidth on the path to and from an MB means better useof the MBrsquos processing functionality Greater network-wideavailable bandwidth also translates to more effective scalingdecisions Together these imply better application perfor-mance per unit cost (a function of MBs in the chain) fora tenant Optimal use of network capacity also allows thecloud provider to help elastically scale more tenant chains

As such Stratos incorporates a placement module thatmaximizes the bandwidth available to a chain while alsocontrolling the chainrsquos network-wide footprint even as thechain scales elastically In what follows we describe algo-rithms for two aspects of placement initially mapping theMBs in a tenantrsquos topology and placing new MB instances

51 Initial PlacementInitial MB placement is triggered whenever a new tenant

arrives or network-wide management actions occur (egVM migration)

There are two main inputs we use for initial placement(1) The tenant-specified logical chains between MB typesand application VMs along with the number of physical in-stances of MB type or application VM Edges are annotatedwith the gaindrop factorfor each MB instance which isratio of the net traffic entering the MB versus that leavingit We assume the tenant estimates these based on prior his-tory or expected traffic patterns For example with an ex-pected 50 redundancy in traffic an RE MB would have againdrop factor of 2 (compressed traffic entering the MB isdecompressed) These factors serve as weights to the edgesin a chain And (2) the available slots across different racksand available bandwidth of different links in the data centertopology The latter is based on historical estimates (egmean maximum or kth percentile) of link utilizations Weassume a uniform distribution of load across all MBs of thesame type

6

While this is a simplistic model it still forms a helpfulbasis for placement (especially vis-a-vis existing naive VMplacement schemes that consider individual VMs in isola-tion See sect10) Given this the placement algorithm has threelogical stages

Partitioning First we partition the topology (entire graphcorresponding to a tenant) with the goal of placing each par-tition in its entirety on a single rack so that we incur minimalinter-rack communication That is we partition the tenantrsquostopology intoK partitions such that for each partition thereis at least one rack with enough available VM slots to ac-commodate the partition We adapt the classical min-K-cutalgorithm [28] to identify the partitions starting withK = 1and increasingK until all partitions are small enough to beaccommodated

Assigning partitions to racks The next stage is to assignracks for each partition Here we use a greedy approachthat proceeds by sorting pairs of partitions in the decreasingorder of the inter-partition communication For each pairif both partitions are unassigned to racks we find a pair ofracks with the highest available bandwidth to accommodatethese two partitions If one of the partitions in the pair isalready assigned to a rack then we simply find a new rackfor the unassigned partition (If both are assigned we simplymove to the next pair)

Assigning VMs to slots Last we assign VMs (ie MBsand application VMs) within each partition to slots in theracks In case there is just one slot per (physical) machinewe randomly pick a slot and assign it to a VM If there aremore available slots we follow a similar procedure to par-tition the VMs so that VMs that communicate more amongeach other can be assigned closer to each other

52 Placing New Middlebox InstancesNew MBs launched after scaling a chain need to be placed

efficiently for scaling to be effective Ideally the new MBplacement should also help support future scale up for boththe tenant in question as for other tenants Our heuristic isdriven by these goals

To more accurately account for the network interaction ofthe scaled MBs we dynamically track the gaindrop factorsfor MBs in the tenantrsquos topology based on prevalent traf-fic patterns at each MB (using EWMA) Placement of thescaled MB considers the estimated ratios for the flows fromMBrsquos input and output VMs (those supplying traffic to andreceiving from the MB respectively) as input Placementthen works as follows

If the new instance can be accommodated in the same rackas its input MBs (or VMs) and output MBs (or VMs) then weplace the new instance in the same rack However if the newinstance cannot be accommodated in the same rack we se-lect a candidate rack (rack with free slots) that has the max-imum available bandwidth to the rack for input and outputMBs When the input and output MBs are in different rackswe consider each candidate rack and estimate the inter-rack

MB traffic using network-aware flow distribution (discussedin the next section) assuming that the new MB is placed inthe candidate rack We select the rack that minimizes theweighted sum of inter-rack flows (or maximizes the band-width available to inter-rack flows)

6 NETWORK-AWAREFLOW DISTRIBUTION

Akin to placement Stratosrsquos flow distribution module ac-tively manages how MBs use network capacity In contrastwith placement however flow distribution can be invoked atfine time-scales

Flow distribution is triggered whenever a scale updowndecision is made In particular the new instance placementheuristic in Section 52 invokes flow distribution when con-sidering the optimal location for the scaled instance Flowre-distribution can be triggered whenever the gain factor ofa MB instance changes significantly (eg from 2 to 1 forthe RE MB in sect5) Stratos periodically monitors each chainfor such changes Finally based on periodic input about net-work utilization from the cloud providerrsquos monitoring func-tionality flow re-distribution can be triggered across multi-ple tenant chains in response to changes in background (non-Stratos) network traffic This helps maximize the bandwidthavailable for intra-chain communications and improves ten-antsrsquo application performance The latter two re-distributionattempts happen at the same periodicity in our prototype

In essence flow distribution helps provide fine-grainedoptimization of chain performance as well as control overchain network footprint for a given physical deployment ofthe chain The key here is that we need to adjust traffic acrossthe entire set of chains of a tenant as focusing just on thescaled instance may result in less-than-ideal improvementsin tenant applicationsrsquo performance

Figure 8 Example tenant topology to explain the terms inthe LP framework for network-aware distribution For clar-ity we do not show the gain factors on the edges

Next we describe a systematic linear-programming (LP)based framework that formally captures the problem of network-aware flow distribution As such the logic we describe hereis generaland applies to multiple scenarios in which suchflow distribution is invoked for instance the common caseis when the distribution module is triggered as a result ofelastic scaling The module may also be triggered due tochanges in the background traffic as well has changes in the

7

gain factors for different MBs in a chain as a result of work-load changes for a given tenant Furthermore this logic eas-ily extends to the multi-tenant scenario with multiple chainsper tenant we simply consider the union of all chains acrossall tenants

Notation Let c denote a specific chain andVc be the to-tal volume (flows) of traffic that require processing via thischain There may be different types of MBs (ie IDS RE)within a chain|c| is the number of MBs in a given chainc Let c[j ] be the type of the middlebox that is at positionj in the chainc (eg IDS RE) Letk denote the type of amiddlebox andMk be the set of MB instances of typek thatthe scaling module has launched ThusMc[j ] is the set ofMB instances of typec[j ] we usei isin Mc[j ] to specify thata MB instancei belongs to this type Figure 8 gives a quickoverview of the different entities involved in this formula-tion

LP Formulation Our goal is to split the traffic across theinstances of each type such that (a) the processing respon-sibilities are distributed roughly equally across them and(b)the aggregatenetwork footprintis minimized Thus we needto determine how the traffic is routed between different MBsLet f (c i i

prime

) denote the volume of traffic in chainc beingrouted from middleboxi to the instancei

prime

(see Figure 8)As a special casef (c i) denotes traffic routed to the firstmiddlebox in a chain from a source element4

Suppose each unit traffic of flowing between a pair of in-stances incurs some network-level costCost(i rarr i prime) de-notes the network-level cost between two instances In thesimplest case this is a binary variablemdash1 if the two MBsare in different racks and 0 otherwise (We can use more ad-vanced measures to capture latency or available bandwidthas well)

Given this setup Figure 9 formalizes the network-awareflow distribution problem that Stratos solves Here Eq (1)captures the network-wide footprint of routing traffic be-tween potential instances of thej th MB in a chain to thej + 1th MB in that chain For completeness we consider allpossible combinations of routing traffic from one instance toanother In practice the optimization will prefer only com-binations that have low footprints

Eq (2) models aflow conservationprinciple For eachchain and for each position in the chain the volume of trafficentering the middlebox has to be equal to the volume exitingit to the next middlebox type in the sequence Since middle-boxes may change the aggregate volume (eg a firewall maydrop traffic or RE may compress traffic) we consider a gen-eralized notion of conservation that also takes into accountthe expected gaindrop factorγ(c j ) which is the ratio ofincoming-to-outgoing traffic at the positionj for the chaincFor initial placement we expect the tenant to provide thesefactors as annotations to the logical topology specification4For clarity we focus only on the forward direction of the chainnoting that our implementation uses an extended formulation thatcaptures bidirectional chains as well

Minimize

sum

c

|c|minus1sum

j=1

sum

iiprime

st

iisinMc[j ]iprime

isinMc[j+1]

Cost(i iprime

)times f (c i iprime

) (1)

subject to

foralli forallc s t i isin Mc[j ] amp j gt 1 sum

iprime iprimeisinMc[jminus1]

f (c iprime

i) =sum

iprime iprimeisinMc[j+1]

f (c i iprime

)times γ(c j )

(2)

forallc sum

iiisinMc[1]

f (c i) = Vc (3)

foralli sum

ciisinMc[j ]j 6=1

sum

iprime

iprime

isinMc[jminus1]

f (c iprime

i)

+sum

ciisinMc[1]

f (c i) asympsum

ciisinciisinMc[j ]

Vc

|Mc [j ]|timesΠj

l=1γ(c l)

(4)

Figure 9 LP formulation for the network-aware flow dis-tribution problem Theasymp term in the last equation simplyrepresents that we have some leeway in allowing the load tobe within 10ndash20 of the mean

the tenant could derived these based on expected traffic pat-terns or history Stratos periodically recomputes these gainfactors based on the observed input-output ratios for eachchain

In addition to this flow conservation we also need to en-sure that each chainrsquos aggregate traffic will be processedthus we also model thiscoverageconstraint in Eq (3) Fi-nally we want to ensure that within each middlebox typethe load is roughly evenly distributed across the instancesofthat type in Eq (4) Here we use a general notion of loadbalancing where we can allow for some leeway say within10-20 of the targeted average load

We must ensure that the periodic flow redistributions andflow distribution accompanying scaling donrsquot enter into raceconditions We take two steps for this First any scalingattempt in a chain is preceded by a redistribution first Onlyif redistribution does not suffice does Stratos initial scalingtrials Second Stratos suspends all redistributions during thetime when scaling trials are being run across a given tenantrsquosdeployment

7 IMPLEMENTATIONWe have implemented a full featured Stratos prototype ca-

pable of running on commodity x86-64 hardware Figure 10shows an overview of the components involved

Stratos Data PlaneThe Stratos data plane is a configurableoverlay network realized through packet encapsulation andprogrammable software switches Each tenant VM has a

8

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 6: Stratos: A Network-Aware Orchestration Layer for

the chain The scaling procedure terminates when we reachthe end of the chain or we see no more improvement

Scale down occurs in a similar fashion except that we lookfor demand drops our prototype checks if there is no unmetdemand and the applicationrsquos throughput drops by a certainpercentage over a 1 minute interval Our current prototypeselects replicas in increasing order of volume served to tryscaling them down (ie removing them) To prevent scaleupdown oscillations we use a ldquodampingrdquo factor and waitfor some time (our prototype uses 25 seconds) before re-attempting scaling

We make a practical choice here to scale one MB type at atime We view this as a reasonable choice because the scal-ing decision for a MB type (and indeed each scaling trial) isaccompanied by careful placement of scaled instances (Sec-tion 5) and redistribution of load across all MBs in the chain(Section 6) The placement and distribution steps help ad-dress network bottlenecks at downstream MBs

Nevertheless it is possible our scaling approach does notimprove application performance eg when two MB typesare equally bottlenecked by compute resources In such caseswe use a conservative fall back to the simplescale allap-proach and add new instances for all MB types in the chain

Multi-chain Topologies When a tenant has multiple chainsin their deployment we could consider running scaling trialsin parallel for each chain However MB types can be sharedacross chains and thus a scaling trial will influence the out-come of other concurrent trials and result in unnecessary orinadequate scaling

Another option is to scale each chain sequentially We usethis as a starting point and speed it up by identifying the setof overlappingchains for each bottlenecked chain

Our approach to scaling in multi-chain topologies is shownin lines 10ndash20 in Figure 7 In the simplest case if a bottle-necked chain shares no MB types then we simply run thesingle chain scaling procedure as discussed earlier (lines15ndash16) If one or more MB types overlaps with another chainand the overlapping chains are also bottlenecked then weguess that the common MB instances are the bottlenecks andonly run the scaling trial for these shared MB types (lines19ndash20) On the other hand if we have overlapping chainswith no bottlenecks then we speculate that the MB typesunique to the current chain are bottlenecked and focus onthese instead (lines 17ndash18) The intuition here is that iden-tifying sharedisolated chains allows us to zoom in on thebottlenecks faster In the case where this heuristic fails toimprove performance (eg chains C1 and C2 share MB typeM that is a bottleneck for C1 but not C2) we err on the sideof caution and adopt a conservative approach and rerun thescaling procedure considering the union of MBs across allthe chains in the setOverlap3

Network-awarenessSince each scale updown trial relies3This fall back requires a minimal amount of state at the Stratoscontroller to track whether it has recently attempted a scaling trialfor a given chain

on the end-to-end application performance metrics our ap-proach isimplicitly network-aware It may be possible todesign explicit approaches that combine monitoring CPUmemory and IO resources with utilization of the networklinks used by a tenantrsquos chain However it appears difficultto precisely identify bottlenecks in such a setting and moreimportantly to determine the extent to which they should bescaled to meet application performance goals We leave suchexplicit approaches as a subject for future work Neverthe-less our evaluation of this implicit scheme shows a lowerbound on the benefits of network-awareness in scaling (sect8)

Since our approach does not rely on VM-level measure-ments it can be applied to tenant deployments with arbitraryMBs In particular tenants can compose cloud provider-offered MBs with those from third-party vendors creatingdiverse chains

5 RACK-AWARE PLACEMENTThe bandwidth available on network links impacts several

aspects of tenant deployments Greater available networkbandwidth on the path to and from an MB means better useof the MBrsquos processing functionality Greater network-wideavailable bandwidth also translates to more effective scalingdecisions Together these imply better application perfor-mance per unit cost (a function of MBs in the chain) fora tenant Optimal use of network capacity also allows thecloud provider to help elastically scale more tenant chains

As such Stratos incorporates a placement module thatmaximizes the bandwidth available to a chain while alsocontrolling the chainrsquos network-wide footprint even as thechain scales elastically In what follows we describe algo-rithms for two aspects of placement initially mapping theMBs in a tenantrsquos topology and placing new MB instances

51 Initial PlacementInitial MB placement is triggered whenever a new tenant

arrives or network-wide management actions occur (egVM migration)

There are two main inputs we use for initial placement(1) The tenant-specified logical chains between MB typesand application VMs along with the number of physical in-stances of MB type or application VM Edges are annotatedwith the gaindrop factorfor each MB instance which isratio of the net traffic entering the MB versus that leavingit We assume the tenant estimates these based on prior his-tory or expected traffic patterns For example with an ex-pected 50 redundancy in traffic an RE MB would have againdrop factor of 2 (compressed traffic entering the MB isdecompressed) These factors serve as weights to the edgesin a chain And (2) the available slots across different racksand available bandwidth of different links in the data centertopology The latter is based on historical estimates (egmean maximum or kth percentile) of link utilizations Weassume a uniform distribution of load across all MBs of thesame type

6

While this is a simplistic model it still forms a helpfulbasis for placement (especially vis-a-vis existing naive VMplacement schemes that consider individual VMs in isola-tion See sect10) Given this the placement algorithm has threelogical stages

Partitioning First we partition the topology (entire graphcorresponding to a tenant) with the goal of placing each par-tition in its entirety on a single rack so that we incur minimalinter-rack communication That is we partition the tenantrsquostopology intoK partitions such that for each partition thereis at least one rack with enough available VM slots to ac-commodate the partition We adapt the classical min-K-cutalgorithm [28] to identify the partitions starting withK = 1and increasingK until all partitions are small enough to beaccommodated

Assigning partitions to racks The next stage is to assignracks for each partition Here we use a greedy approachthat proceeds by sorting pairs of partitions in the decreasingorder of the inter-partition communication For each pairif both partitions are unassigned to racks we find a pair ofracks with the highest available bandwidth to accommodatethese two partitions If one of the partitions in the pair isalready assigned to a rack then we simply find a new rackfor the unassigned partition (If both are assigned we simplymove to the next pair)

Assigning VMs to slots Last we assign VMs (ie MBsand application VMs) within each partition to slots in theracks In case there is just one slot per (physical) machinewe randomly pick a slot and assign it to a VM If there aremore available slots we follow a similar procedure to par-tition the VMs so that VMs that communicate more amongeach other can be assigned closer to each other

52 Placing New Middlebox InstancesNew MBs launched after scaling a chain need to be placed

efficiently for scaling to be effective Ideally the new MBplacement should also help support future scale up for boththe tenant in question as for other tenants Our heuristic isdriven by these goals

To more accurately account for the network interaction ofthe scaled MBs we dynamically track the gaindrop factorsfor MBs in the tenantrsquos topology based on prevalent traf-fic patterns at each MB (using EWMA) Placement of thescaled MB considers the estimated ratios for the flows fromMBrsquos input and output VMs (those supplying traffic to andreceiving from the MB respectively) as input Placementthen works as follows

If the new instance can be accommodated in the same rackas its input MBs (or VMs) and output MBs (or VMs) then weplace the new instance in the same rack However if the newinstance cannot be accommodated in the same rack we se-lect a candidate rack (rack with free slots) that has the max-imum available bandwidth to the rack for input and outputMBs When the input and output MBs are in different rackswe consider each candidate rack and estimate the inter-rack

MB traffic using network-aware flow distribution (discussedin the next section) assuming that the new MB is placed inthe candidate rack We select the rack that minimizes theweighted sum of inter-rack flows (or maximizes the band-width available to inter-rack flows)

6 NETWORK-AWAREFLOW DISTRIBUTION

Akin to placement Stratosrsquos flow distribution module ac-tively manages how MBs use network capacity In contrastwith placement however flow distribution can be invoked atfine time-scales

Flow distribution is triggered whenever a scale updowndecision is made In particular the new instance placementheuristic in Section 52 invokes flow distribution when con-sidering the optimal location for the scaled instance Flowre-distribution can be triggered whenever the gain factor ofa MB instance changes significantly (eg from 2 to 1 forthe RE MB in sect5) Stratos periodically monitors each chainfor such changes Finally based on periodic input about net-work utilization from the cloud providerrsquos monitoring func-tionality flow re-distribution can be triggered across multi-ple tenant chains in response to changes in background (non-Stratos) network traffic This helps maximize the bandwidthavailable for intra-chain communications and improves ten-antsrsquo application performance The latter two re-distributionattempts happen at the same periodicity in our prototype

In essence flow distribution helps provide fine-grainedoptimization of chain performance as well as control overchain network footprint for a given physical deployment ofthe chain The key here is that we need to adjust traffic acrossthe entire set of chains of a tenant as focusing just on thescaled instance may result in less-than-ideal improvementsin tenant applicationsrsquo performance

Figure 8 Example tenant topology to explain the terms inthe LP framework for network-aware distribution For clar-ity we do not show the gain factors on the edges

Next we describe a systematic linear-programming (LP)based framework that formally captures the problem of network-aware flow distribution As such the logic we describe hereis generaland applies to multiple scenarios in which suchflow distribution is invoked for instance the common caseis when the distribution module is triggered as a result ofelastic scaling The module may also be triggered due tochanges in the background traffic as well has changes in the

7

gain factors for different MBs in a chain as a result of work-load changes for a given tenant Furthermore this logic eas-ily extends to the multi-tenant scenario with multiple chainsper tenant we simply consider the union of all chains acrossall tenants

Notation Let c denote a specific chain andVc be the to-tal volume (flows) of traffic that require processing via thischain There may be different types of MBs (ie IDS RE)within a chain|c| is the number of MBs in a given chainc Let c[j ] be the type of the middlebox that is at positionj in the chainc (eg IDS RE) Letk denote the type of amiddlebox andMk be the set of MB instances of typek thatthe scaling module has launched ThusMc[j ] is the set ofMB instances of typec[j ] we usei isin Mc[j ] to specify thata MB instancei belongs to this type Figure 8 gives a quickoverview of the different entities involved in this formula-tion

LP Formulation Our goal is to split the traffic across theinstances of each type such that (a) the processing respon-sibilities are distributed roughly equally across them and(b)the aggregatenetwork footprintis minimized Thus we needto determine how the traffic is routed between different MBsLet f (c i i

prime

) denote the volume of traffic in chainc beingrouted from middleboxi to the instancei

prime

(see Figure 8)As a special casef (c i) denotes traffic routed to the firstmiddlebox in a chain from a source element4

Suppose each unit traffic of flowing between a pair of in-stances incurs some network-level costCost(i rarr i prime) de-notes the network-level cost between two instances In thesimplest case this is a binary variablemdash1 if the two MBsare in different racks and 0 otherwise (We can use more ad-vanced measures to capture latency or available bandwidthas well)

Given this setup Figure 9 formalizes the network-awareflow distribution problem that Stratos solves Here Eq (1)captures the network-wide footprint of routing traffic be-tween potential instances of thej th MB in a chain to thej + 1th MB in that chain For completeness we consider allpossible combinations of routing traffic from one instance toanother In practice the optimization will prefer only com-binations that have low footprints

Eq (2) models aflow conservationprinciple For eachchain and for each position in the chain the volume of trafficentering the middlebox has to be equal to the volume exitingit to the next middlebox type in the sequence Since middle-boxes may change the aggregate volume (eg a firewall maydrop traffic or RE may compress traffic) we consider a gen-eralized notion of conservation that also takes into accountthe expected gaindrop factorγ(c j ) which is the ratio ofincoming-to-outgoing traffic at the positionj for the chaincFor initial placement we expect the tenant to provide thesefactors as annotations to the logical topology specification4For clarity we focus only on the forward direction of the chainnoting that our implementation uses an extended formulation thatcaptures bidirectional chains as well

Minimize

sum

c

|c|minus1sum

j=1

sum

iiprime

st

iisinMc[j ]iprime

isinMc[j+1]

Cost(i iprime

)times f (c i iprime

) (1)

subject to

foralli forallc s t i isin Mc[j ] amp j gt 1 sum

iprime iprimeisinMc[jminus1]

f (c iprime

i) =sum

iprime iprimeisinMc[j+1]

f (c i iprime

)times γ(c j )

(2)

forallc sum

iiisinMc[1]

f (c i) = Vc (3)

foralli sum

ciisinMc[j ]j 6=1

sum

iprime

iprime

isinMc[jminus1]

f (c iprime

i)

+sum

ciisinMc[1]

f (c i) asympsum

ciisinciisinMc[j ]

Vc

|Mc [j ]|timesΠj

l=1γ(c l)

(4)

Figure 9 LP formulation for the network-aware flow dis-tribution problem Theasymp term in the last equation simplyrepresents that we have some leeway in allowing the load tobe within 10ndash20 of the mean

the tenant could derived these based on expected traffic pat-terns or history Stratos periodically recomputes these gainfactors based on the observed input-output ratios for eachchain

In addition to this flow conservation we also need to en-sure that each chainrsquos aggregate traffic will be processedthus we also model thiscoverageconstraint in Eq (3) Fi-nally we want to ensure that within each middlebox typethe load is roughly evenly distributed across the instancesofthat type in Eq (4) Here we use a general notion of loadbalancing where we can allow for some leeway say within10-20 of the targeted average load

We must ensure that the periodic flow redistributions andflow distribution accompanying scaling donrsquot enter into raceconditions We take two steps for this First any scalingattempt in a chain is preceded by a redistribution first Onlyif redistribution does not suffice does Stratos initial scalingtrials Second Stratos suspends all redistributions during thetime when scaling trials are being run across a given tenantrsquosdeployment

7 IMPLEMENTATIONWe have implemented a full featured Stratos prototype ca-

pable of running on commodity x86-64 hardware Figure 10shows an overview of the components involved

Stratos Data PlaneThe Stratos data plane is a configurableoverlay network realized through packet encapsulation andprogrammable software switches Each tenant VM has a

8

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 7: Stratos: A Network-Aware Orchestration Layer for

While this is a simplistic model it still forms a helpfulbasis for placement (especially vis-a-vis existing naive VMplacement schemes that consider individual VMs in isola-tion See sect10) Given this the placement algorithm has threelogical stages

Partitioning First we partition the topology (entire graphcorresponding to a tenant) with the goal of placing each par-tition in its entirety on a single rack so that we incur minimalinter-rack communication That is we partition the tenantrsquostopology intoK partitions such that for each partition thereis at least one rack with enough available VM slots to ac-commodate the partition We adapt the classical min-K-cutalgorithm [28] to identify the partitions starting withK = 1and increasingK until all partitions are small enough to beaccommodated

Assigning partitions to racks The next stage is to assignracks for each partition Here we use a greedy approachthat proceeds by sorting pairs of partitions in the decreasingorder of the inter-partition communication For each pairif both partitions are unassigned to racks we find a pair ofracks with the highest available bandwidth to accommodatethese two partitions If one of the partitions in the pair isalready assigned to a rack then we simply find a new rackfor the unassigned partition (If both are assigned we simplymove to the next pair)

Assigning VMs to slots Last we assign VMs (ie MBsand application VMs) within each partition to slots in theracks In case there is just one slot per (physical) machinewe randomly pick a slot and assign it to a VM If there aremore available slots we follow a similar procedure to par-tition the VMs so that VMs that communicate more amongeach other can be assigned closer to each other

52 Placing New Middlebox InstancesNew MBs launched after scaling a chain need to be placed

efficiently for scaling to be effective Ideally the new MBplacement should also help support future scale up for boththe tenant in question as for other tenants Our heuristic isdriven by these goals

To more accurately account for the network interaction ofthe scaled MBs we dynamically track the gaindrop factorsfor MBs in the tenantrsquos topology based on prevalent traf-fic patterns at each MB (using EWMA) Placement of thescaled MB considers the estimated ratios for the flows fromMBrsquos input and output VMs (those supplying traffic to andreceiving from the MB respectively) as input Placementthen works as follows

If the new instance can be accommodated in the same rackas its input MBs (or VMs) and output MBs (or VMs) then weplace the new instance in the same rack However if the newinstance cannot be accommodated in the same rack we se-lect a candidate rack (rack with free slots) that has the max-imum available bandwidth to the rack for input and outputMBs When the input and output MBs are in different rackswe consider each candidate rack and estimate the inter-rack

MB traffic using network-aware flow distribution (discussedin the next section) assuming that the new MB is placed inthe candidate rack We select the rack that minimizes theweighted sum of inter-rack flows (or maximizes the band-width available to inter-rack flows)

6 NETWORK-AWAREFLOW DISTRIBUTION

Akin to placement Stratosrsquos flow distribution module ac-tively manages how MBs use network capacity In contrastwith placement however flow distribution can be invoked atfine time-scales

Flow distribution is triggered whenever a scale updowndecision is made In particular the new instance placementheuristic in Section 52 invokes flow distribution when con-sidering the optimal location for the scaled instance Flowre-distribution can be triggered whenever the gain factor ofa MB instance changes significantly (eg from 2 to 1 forthe RE MB in sect5) Stratos periodically monitors each chainfor such changes Finally based on periodic input about net-work utilization from the cloud providerrsquos monitoring func-tionality flow re-distribution can be triggered across multi-ple tenant chains in response to changes in background (non-Stratos) network traffic This helps maximize the bandwidthavailable for intra-chain communications and improves ten-antsrsquo application performance The latter two re-distributionattempts happen at the same periodicity in our prototype

In essence flow distribution helps provide fine-grainedoptimization of chain performance as well as control overchain network footprint for a given physical deployment ofthe chain The key here is that we need to adjust traffic acrossthe entire set of chains of a tenant as focusing just on thescaled instance may result in less-than-ideal improvementsin tenant applicationsrsquo performance

Figure 8 Example tenant topology to explain the terms inthe LP framework for network-aware distribution For clar-ity we do not show the gain factors on the edges

Next we describe a systematic linear-programming (LP)based framework that formally captures the problem of network-aware flow distribution As such the logic we describe hereis generaland applies to multiple scenarios in which suchflow distribution is invoked for instance the common caseis when the distribution module is triggered as a result ofelastic scaling The module may also be triggered due tochanges in the background traffic as well has changes in the

7

gain factors for different MBs in a chain as a result of work-load changes for a given tenant Furthermore this logic eas-ily extends to the multi-tenant scenario with multiple chainsper tenant we simply consider the union of all chains acrossall tenants

Notation Let c denote a specific chain andVc be the to-tal volume (flows) of traffic that require processing via thischain There may be different types of MBs (ie IDS RE)within a chain|c| is the number of MBs in a given chainc Let c[j ] be the type of the middlebox that is at positionj in the chainc (eg IDS RE) Letk denote the type of amiddlebox andMk be the set of MB instances of typek thatthe scaling module has launched ThusMc[j ] is the set ofMB instances of typec[j ] we usei isin Mc[j ] to specify thata MB instancei belongs to this type Figure 8 gives a quickoverview of the different entities involved in this formula-tion

LP Formulation Our goal is to split the traffic across theinstances of each type such that (a) the processing respon-sibilities are distributed roughly equally across them and(b)the aggregatenetwork footprintis minimized Thus we needto determine how the traffic is routed between different MBsLet f (c i i

prime

) denote the volume of traffic in chainc beingrouted from middleboxi to the instancei

prime

(see Figure 8)As a special casef (c i) denotes traffic routed to the firstmiddlebox in a chain from a source element4

Suppose each unit traffic of flowing between a pair of in-stances incurs some network-level costCost(i rarr i prime) de-notes the network-level cost between two instances In thesimplest case this is a binary variablemdash1 if the two MBsare in different racks and 0 otherwise (We can use more ad-vanced measures to capture latency or available bandwidthas well)

Given this setup Figure 9 formalizes the network-awareflow distribution problem that Stratos solves Here Eq (1)captures the network-wide footprint of routing traffic be-tween potential instances of thej th MB in a chain to thej + 1th MB in that chain For completeness we consider allpossible combinations of routing traffic from one instance toanother In practice the optimization will prefer only com-binations that have low footprints

Eq (2) models aflow conservationprinciple For eachchain and for each position in the chain the volume of trafficentering the middlebox has to be equal to the volume exitingit to the next middlebox type in the sequence Since middle-boxes may change the aggregate volume (eg a firewall maydrop traffic or RE may compress traffic) we consider a gen-eralized notion of conservation that also takes into accountthe expected gaindrop factorγ(c j ) which is the ratio ofincoming-to-outgoing traffic at the positionj for the chaincFor initial placement we expect the tenant to provide thesefactors as annotations to the logical topology specification4For clarity we focus only on the forward direction of the chainnoting that our implementation uses an extended formulation thatcaptures bidirectional chains as well

Minimize

sum

c

|c|minus1sum

j=1

sum

iiprime

st

iisinMc[j ]iprime

isinMc[j+1]

Cost(i iprime

)times f (c i iprime

) (1)

subject to

foralli forallc s t i isin Mc[j ] amp j gt 1 sum

iprime iprimeisinMc[jminus1]

f (c iprime

i) =sum

iprime iprimeisinMc[j+1]

f (c i iprime

)times γ(c j )

(2)

forallc sum

iiisinMc[1]

f (c i) = Vc (3)

foralli sum

ciisinMc[j ]j 6=1

sum

iprime

iprime

isinMc[jminus1]

f (c iprime

i)

+sum

ciisinMc[1]

f (c i) asympsum

ciisinciisinMc[j ]

Vc

|Mc [j ]|timesΠj

l=1γ(c l)

(4)

Figure 9 LP formulation for the network-aware flow dis-tribution problem Theasymp term in the last equation simplyrepresents that we have some leeway in allowing the load tobe within 10ndash20 of the mean

the tenant could derived these based on expected traffic pat-terns or history Stratos periodically recomputes these gainfactors based on the observed input-output ratios for eachchain

In addition to this flow conservation we also need to en-sure that each chainrsquos aggregate traffic will be processedthus we also model thiscoverageconstraint in Eq (3) Fi-nally we want to ensure that within each middlebox typethe load is roughly evenly distributed across the instancesofthat type in Eq (4) Here we use a general notion of loadbalancing where we can allow for some leeway say within10-20 of the targeted average load

We must ensure that the periodic flow redistributions andflow distribution accompanying scaling donrsquot enter into raceconditions We take two steps for this First any scalingattempt in a chain is preceded by a redistribution first Onlyif redistribution does not suffice does Stratos initial scalingtrials Second Stratos suspends all redistributions during thetime when scaling trials are being run across a given tenantrsquosdeployment

7 IMPLEMENTATIONWe have implemented a full featured Stratos prototype ca-

pable of running on commodity x86-64 hardware Figure 10shows an overview of the components involved

Stratos Data PlaneThe Stratos data plane is a configurableoverlay network realized through packet encapsulation andprogrammable software switches Each tenant VM has a

8

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 8: Stratos: A Network-Aware Orchestration Layer for

gain factors for different MBs in a chain as a result of work-load changes for a given tenant Furthermore this logic eas-ily extends to the multi-tenant scenario with multiple chainsper tenant we simply consider the union of all chains acrossall tenants

Notation Let c denote a specific chain andVc be the to-tal volume (flows) of traffic that require processing via thischain There may be different types of MBs (ie IDS RE)within a chain|c| is the number of MBs in a given chainc Let c[j ] be the type of the middlebox that is at positionj in the chainc (eg IDS RE) Letk denote the type of amiddlebox andMk be the set of MB instances of typek thatthe scaling module has launched ThusMc[j ] is the set ofMB instances of typec[j ] we usei isin Mc[j ] to specify thata MB instancei belongs to this type Figure 8 gives a quickoverview of the different entities involved in this formula-tion

LP Formulation Our goal is to split the traffic across theinstances of each type such that (a) the processing respon-sibilities are distributed roughly equally across them and(b)the aggregatenetwork footprintis minimized Thus we needto determine how the traffic is routed between different MBsLet f (c i i

prime

) denote the volume of traffic in chainc beingrouted from middleboxi to the instancei

prime

(see Figure 8)As a special casef (c i) denotes traffic routed to the firstmiddlebox in a chain from a source element4

Suppose each unit traffic of flowing between a pair of in-stances incurs some network-level costCost(i rarr i prime) de-notes the network-level cost between two instances In thesimplest case this is a binary variablemdash1 if the two MBsare in different racks and 0 otherwise (We can use more ad-vanced measures to capture latency or available bandwidthas well)

Given this setup Figure 9 formalizes the network-awareflow distribution problem that Stratos solves Here Eq (1)captures the network-wide footprint of routing traffic be-tween potential instances of thej th MB in a chain to thej + 1th MB in that chain For completeness we consider allpossible combinations of routing traffic from one instance toanother In practice the optimization will prefer only com-binations that have low footprints

Eq (2) models aflow conservationprinciple For eachchain and for each position in the chain the volume of trafficentering the middlebox has to be equal to the volume exitingit to the next middlebox type in the sequence Since middle-boxes may change the aggregate volume (eg a firewall maydrop traffic or RE may compress traffic) we consider a gen-eralized notion of conservation that also takes into accountthe expected gaindrop factorγ(c j ) which is the ratio ofincoming-to-outgoing traffic at the positionj for the chaincFor initial placement we expect the tenant to provide thesefactors as annotations to the logical topology specification4For clarity we focus only on the forward direction of the chainnoting that our implementation uses an extended formulation thatcaptures bidirectional chains as well

Minimize

sum

c

|c|minus1sum

j=1

sum

iiprime

st

iisinMc[j ]iprime

isinMc[j+1]

Cost(i iprime

)times f (c i iprime

) (1)

subject to

foralli forallc s t i isin Mc[j ] amp j gt 1 sum

iprime iprimeisinMc[jminus1]

f (c iprime

i) =sum

iprime iprimeisinMc[j+1]

f (c i iprime

)times γ(c j )

(2)

forallc sum

iiisinMc[1]

f (c i) = Vc (3)

foralli sum

ciisinMc[j ]j 6=1

sum

iprime

iprime

isinMc[jminus1]

f (c iprime

i)

+sum

ciisinMc[1]

f (c i) asympsum

ciisinciisinMc[j ]

Vc

|Mc [j ]|timesΠj

l=1γ(c l)

(4)

Figure 9 LP formulation for the network-aware flow dis-tribution problem Theasymp term in the last equation simplyrepresents that we have some leeway in allowing the load tobe within 10ndash20 of the mean

the tenant could derived these based on expected traffic pat-terns or history Stratos periodically recomputes these gainfactors based on the observed input-output ratios for eachchain

In addition to this flow conservation we also need to en-sure that each chainrsquos aggregate traffic will be processedthus we also model thiscoverageconstraint in Eq (3) Fi-nally we want to ensure that within each middlebox typethe load is roughly evenly distributed across the instancesofthat type in Eq (4) Here we use a general notion of loadbalancing where we can allow for some leeway say within10-20 of the targeted average load

We must ensure that the periodic flow redistributions andflow distribution accompanying scaling donrsquot enter into raceconditions We take two steps for this First any scalingattempt in a chain is preceded by a redistribution first Onlyif redistribution does not suffice does Stratos initial scalingtrials Second Stratos suspends all redistributions during thetime when scaling trials are being run across a given tenantrsquosdeployment

7 IMPLEMENTATIONWe have implemented a full featured Stratos prototype ca-

pable of running on commodity x86-64 hardware Figure 10shows an overview of the components involved

Stratos Data PlaneThe Stratos data plane is a configurableoverlay network realized through packet encapsulation andprogrammable software switches Each tenant VM has a

8

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 9: Stratos: A Network-Aware Orchestration Layer for

Figure 10Stratos prototype implementation

pair of virtual interfaces that tap one of two Open vSwitcheswithin the hostrsquos privileged domain Packets sent to one ofthe virtual interfaces are transmitted via a GRE tunnel tothe software switch on the host of the destination VM fromwhence it is bridged to the appropriate destination interfaceThe other interface is reserved for management traffic OpenvSwitch holds the responsibility for encapsulating packetsfor transmission across the network

Traffic is directed between the local host and the correctdestination server using Open vSwitch A single bridge (ieswitch) on each privileged domain contains a virtual inter-face per tenant VM Forwarding rules are matched based onthe switch port on which it arrived the final destination ofthe packet and a tag stored in the IP Type of Service (TOS)field Using tags reduces the number of flow entries in theswitches providing an important performance boost For-warding rules are installed by the central Stratos controller

Stratos Controller The Stratos controller is implementedas an application running atop Floodlight [6] and interfaceswith the Open vSwitch instances using the OpenFlow pro-tocol [27] The controller application takes a logical topol-ogy as input which defines the tenants chains and the VMinstances of each clientserverMB in the chains The con-troller transforms this topology into a set of forwarding ruleswhich are installed in the Open vSwitch instances in eachphysical host The controller also gathers performance met-rics from network switches application end-points and MBsusing SNMP These inputs are using in the rest of the mod-ules in the controller namely those for scaling placementand flow distribution Our controller launches and termi-nates VMs using Xen [15]

8 EVALUATIONWe evaluate Stratos in three different ways First we con-

duct controlled testbed experiments using our prototype toexamine in detail the benefits of different components ofStratosndash application-aware scaling placement and load dis-tribution Second we run a modified version of our proto-type on EC2 to understand the performance of Stratos in adynamic scenario Since EC2 does not provide control overplacement this prototype can only perform network-awarescaling and load distribution Finally we simulate Stratos tounderstand the benefits of Stratos at scale

There are three dimensions in our evaluation (1) Choiceof scaling approach leveraging CPU and memory utiliza-tion at a MB to determine if it is a bottleneck (threshold) vsusing application-aware scaling (aware) (2) Placement ran-domly selecting a rack (rand) or using our network-awareplacement (aware) (3) Flow distribution eitheruniformornetwork-awareflow distribution We assume that both ini-tial and scaled instance deployment use identical placementand load distribution schemes

We study a variety of metrics the effectiveness of scal-ing decisions both in terms of when they are triggered andhow many MBs are used the throughput of tenant applica-tions unmet demand and utilization of MBs and providerrsquosinfrastructure

81 Controlled Testbed ExperimentsOur testbed consists of 24 machines with 3 VM slots

each deployed uniformly across 8 racks The Stratos con-troller runs on a seperate purpose specific machine Unlessotherwise specified we consider a single tenant whose logi-cal topology is a single chain consisting of client an RE MBan IPS MB (standalone throughputs of 240 and 80Mbps re-spectively) and servers The RE and IPS MBs use Click [16]and Suricata 111 [13] respectively

We build a multi-threaded workload generator that worksbetween a client-server pair in the following manner thethreads running at a client share a (sufficiently large) tokenbucket that fills at a rate specified by a workload pattern (egsteady increasing or sine-wave) A client thread draws asingle token from the bucket prior to initiating a connectionto the server if none are available it blocks New connec-tions are issued by a client only after the previous connectionfinishes and another credit has been obtained The numberof outstanding tokens indicates the unmet demand and eachtoken corresponds to a request of 100KB

We impose background traffic in our experiments by run-ning our workload generator (ldquosteadyrdquo pattern) across spe-cific pairs of MBs in our testbed We experiment both withfixed and variable background traffic patterns we focus largelyon results for the former for brevity

Overall benefitsWe ran Stratos atop the testbed using a lin-early increasing workload pattern Background traffic wasfixed at such a rate that utilization of the aggregation linksin our topology varied from 25 to 50 Figure 11 shows anexecution of Stratos which we describe asaware aware aware meaning that scaling is initiated in response to ap-plication demand and that MB placement and flow distribu-tion are both network-aware We first compare it against acompletely network-agnostic approach labeledthreshold rand uniform wherein scaling decisions are entirely basedon CPU load exceeding 80 percent for a period of five sec-onds From Figure 11(a) we note that the naive approachrsquosthroughput starts to drop at around 300s when the unmetdemand skyrockets In contrast Stratos has sustained highthroughput (measured in requests per second per process

9

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 10: Stratos: A Network-Aware Orchestration Layer for

0

2

4

6

8

10

12

14

260 280 300 320 340 360 380 400

Req

uest

ss

solid

ser

ved

das

hed

uns

erve

d

Time (sec)

AwareAwareAwareThresholdUniformRandom

0

5

10

15

20

0 100 200 300 400 500 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareThreshAwareAware

AwareRandUniThreshRandUni

Figure 11 Number of MBs used (a - top) and throughputand unmet demand (b - bottom)

while nine processes execute concurrently) and no signif-icant unmet demand Figure 11(b) shows the correspond-ing scaling decisions We see that Stratos uses 2X fewerinstances than the naive threshold rand uniform approachyet it offers better throughput However comparing the fig-ures describing Stratosrsquos scaling behavior with correspond-ing demand graphs it is apparent that Stratosrsquos ability toscale to meet increasing demand is unhindered by its initialeconomy of MB allocation

Next we attempt to tease apart the relative contribution ofthe three network-aware components in Stratos

Application-aware scaling benefitsFigure 11(b) also showsthe number of MB instances used by two other schemesthreshold aware aware and aware rand uniform Takingall the four schemes into account together we notice that theapplication-aware scaling heuristic outperforms naive scal-ing (aware versus threshold) using nearly 2X fewer in-stances In terms of throughput we noticed that aware aware aware is about 10 better than thresholdawareawarewhereas aware rand uniform is actually about 10lowerin throughput than threshold rand uniform (results omittedfor brevity)

Taken together these results indicate that while the application-aware scaling heuristic helps scale the appropriate MBs re-sulting in fewer MBs being used it critically relies on place-ment and load-balancing to be network aware in order tomake effective use of MB capacity and to offer optimal application-level performace We explore the role of placement and loadbalancing in more detail next

PlacementWe first understand the impact of network-awareplacement decisions in Stratos We run Stratos and aware

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareRandAware

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareRandAware

Figure 12 Effect of placement decisions (a - top) onthroughput and unmet demand (b - bottom) with fixed back-ground traffic Unmet demand is shown using dashed lines

rand aware against the same fixed background traffic andworkload

We compare the two schemesrsquo performance against thisworkload The results are shown in Figure 12 (a) We imme-diately see that aware rand aware attempts to scale signif-icantly more frequently than Stratos and that those attemptsusually fail As is shown by Figure 12 (b) these attempts toscale up are the result of spikes in unsatisfied demand whichrequire multiple scaling attempts to accommodate

By contrast it is apparent from these figures both thatStratos needs to attempt to scale much less often and thatwhen it does those attempts are significantly more likely tobe successful

Flow Distribution We next understand the impact of network-aware flow distribution in Stratos As before we run Stratosand aware aware uniform against the same backgroundtraffic and workload so as to ascertain their behavioral dif-ferences

We see that in order to satisfy the same demand aware aware uniform requires more middlebox instances thanStratos More significantly though we see Stratos is nonethe-less better situated to respond to surges in demand it is ableto satisfy queued requests quicker with less scaling andwith less turbulence in subsequent traffic

Although these results employ a small scale testbed withsynthetic traffic patterns they serve to highlight the impor-tance of the individual components of Stratos Specificallymaking any one component network-agnostic results in us-ing more MBs than necessary poor throughput and substan-tial buildup of unmet demand We also experiments with

10

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 11: Stratos: A Network-Aware Orchestration Layer for

0

5

10

15

20

200 250 300 350 400 450 500 550 600

Mid

dleb

ox a

lloca

tions

Time (s)

AwareAwareAwareAwareAwareUni

0

10

20

30

40

50

60

200 250 300 350 400 450 500 550 600

Req

uest

s pe

r se

cond

Time (s)

AwareAwareAwareAwareAwareUni

Figure 13 Effect of flow distribution decisions on scaling(a - top) and on demand satisfaction (b - bottom) with fixedbackground traffic Unmet demand is shown using dashedlines

variable background traffic different workload patterns andfound the above observations to hold qualitatively We pro-vide further evidence using our EC2 prototype and simula-tions

82 (Restricted) Stratos in a Dynamic Scenario

Prototype detailsOur EC2 prototype is similar to our full-fledged prototype minus network-aware placement Insteadwe rely on EC2 to place any and all MBs this is somethingwe cannot control To enable network-aware load distri-bution we periodically collect available bandwidth usingapacket-pair-based measurement tool [31] between adjacentMBs in a tenantrsquos deployment

Multi-chain tenant deployment Whereas the previous ex-periments used a simple chain we now have the tenant de-ploy the multi-chain setup shown in Figure 5 Each clientVM runs httperf [7] to request a 50KB file from a corre-sponding server VM running Apache (thus client A requestsfrom server A) We deploy each MB as a small EC2 instanceto emulate bottlenecks client server and tagger are largeinstances the controller runs on a micro instance A clientrequests a 50KB file from a server running Apache each is alarge EC2 instance We mark a chain as being bottleneckedif there is a sustained unmet demand of 28 Mbps for a pe-riod of at least 20 seconds We use a 25 second gap betweenscaling trials and we use a 2 Mbps improvement thresholdto retain an instance

EC2 Setup Latency We first measure the setup overhead

Task TimeLogical-to-Physical 5msData Plane Setup (Create Tunnels) 24s per VMData Plane Config (Install Rules in Open vSwitch)3ms per VM

Table 1Stratos setup latency

Time (s)0 120 240 360 480 600 720 840

1234

1234

1234

123

MB

W

MB

X

MB

Y

MB

Z

(a) MB Instance CountsTime (s)

0 120 240 360 480 600 720 840

Thr

ough

put (

Mbp

s)

0

10

20

30

40

50

C1 DemandC1 ServedC2 DemandC2 Served

(b) Application PerformanceFigure 14Multiple chain scaling

associated with Stratos The setup cost includes the time re-quired to launch the data plane components (taps and switch)on each VM transform the logical chains into per-VM con-figurations and configure each VMrsquos data plane components(Table 1) The total setup time for our example chain (withone instance of each MB) isasymp12s (high because EC2 doesnot allow parallel deploymentsetup of VMs) Relative to thetime to launch a VM (on the order of few tens of seconds)this represents a small overhead

Effectiveness of ScalingTo emulate induced bottlenecksin the shared (X Y) or unshared (W Z) MBs (See Figure 5)we use artificial Click [25] MBs that rate limit packets at55K 9K 7K and 10K packetssecond for instances of WX Y and Z respectively We impose an initial demand of16Mbps on each chain increasing demand by 4Mbps ev-ery 2 minutes Figure 14 shows the scaling result and theapplication performance The shared MBs become bottle-necked first because they incur load from both clients Ourheuristic accurately attempts to scale these MBs first it doesnot attempt to scale the unshared MBs because the bottle-neck is eliminated by first adding two instances of Y andthen an instance of X When demand increases to 36Mbpson each chain W becomes a bottleneck for Chain 1 whichour heuristic rightly scales without conducting unnecessaryscaling trials for X Y or Z

Our approach ensures that application demand is entirelyserved most of the time No gap between demand and servedpersists for longer than 60 seconds Without our extensionchains would need to be scaled sequentially increasing theduration of these gaps For example the gap at 240s wouldpersist for an additional 25s while an unnecessary scalingtrial was conducted with W prior to scaling trials with X andY

Effectiveness of Flow Distribution We now evaluate thebenefits of network-aware flow distribution We compareuniform and network-aware flow distribution for a singlepoint in the scaling spacemdash3 RE and 4 IPSmdashfor the sin-gle chain The MB instances are clustered into two groupslimiting the flow of traffic between the groups to 12K pack-ets per second Application demand starts at 60Mbps and

11

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 12: Stratos: A Network-Aware Orchestration Layer for

Time (s)0 120 240 360 480

S

erve

d60

70

80

90

100

UniformNetwork

Figure 15Application goodput with uniform and network-aware flow distribution at a fixed level of scaling

increases by 10Mbps every 2 minutesFigure 15 compares the percent of application demand

served under the two distribution mechanisms We observethat the same set of MBs is able to serve higher demandwhen network-aware flow distribution is employed with ademand of 100Mbps 90 is served under network-awaredistribution versus only about 75 with uniform distribu-tion (The consistent 5 of unserved demand with network-aware distribution is a result of EC2 network variability be-tween our runs which further highlights the need for a Stratos-like approach for simplifying MB management)

83 Simulations Stratos at Scale

Simulation setupWe developed a simulator to evaluate themacroscopic benefits of Stratos at large scales While weexamined complex scenarios using the simulator we presentresults using somewhat restrictive setups for clarity Specif-ically for the scenarios below the simulator takes as input(1) a data center topology consisting of racks and switches(2) the number of tenants (3) chain with elements and initialinstances (all tenant use the same deployment pattern) and(4) a fixed application demand (in Mbps) common acrosstenants

We run our simulator to place 200 tenants within a 500-rack data center We run the network-aware scaling heuristicfor each tenant runs until the tenantrsquos full demand is satisfiedor no further performance improvement can be achievedThe data center is arranged in a tree topology with 10 VMslots per rack and a capacity of 1Gbps on each network linkAll tenants use the same deploymentmdasha simple chain con-taining clients (3 instances) MB-type1 (2) MB-type2 (1)MB-type3 (2) and servers (4)mdashwhich initially consists of12VMs thus every tenant is forced to spread her VMs acrossracks The capacity of each instance of the MB-type1 type2and type3 is fixed at 60 50 and 110Mbps respectively Theapplication demand between each client and server pair is100Mbps for a total traffic demand of 300Mbps We as-sume intra-rack links are very high capacity

First we look at the tenant demand that can be servedunder different combinations of placement and flow distri-bution during scaling (Figure 16(a)) we assume all tenantdeployments are initially placed in a network-aware fashionWe observe immediately thatawareplacementawaredistri-bution is the best in that a greater fraction of the demandcan be served across all tenants than then remaining com-binations At the other extreme random placement coupled

with uniform distribution results in less than 30 of demandserved across all tenants The other possibilities offer inter-mediate performance as expected with randomaware out-performing awareuniform this indicates the relative impor-tant of network-aware load distribution compared to networkaware placement of scaling instances (note that all chainsinitially are placed in a network-aware fashion)

Performance per $ Tenants are currently charged basedon the number of instances provisioned Thus it is crucialthat tenants maximally utilize their MB instances BecauseStratos actively managed MB interactions it helps improvethe bandwidth available between successive MBs in a de-ployment thereby helping MB resources to be used more ef-fectively We illustrate the benefits of this next Figure 16(b)presents a CDF of the amount of traffic served for each ten-ant relative to the number of instances deployedAwaredis-tribution results in a significant increase in the amount oftraffic served per-instance for the median tenant with bothplacement algorithms 8MBps withaware placement and2MBps with rand As before we again see the greater im-protance of network-aware load distribution relative to place-ment

Percent of demand served30 40 50 60 70 80 90 100

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

MBps num Instances0 10

Fra

ctio

n of

tena

nts

0

02

04

06

08

1

RandUniformRandAwareAwareUniformAwareAware

Figure 16 Tenant load served (a - top) and Traffic serveddivided by number of instances (b - bottom)

Provider view Figure 17 presents a CDF of the amount ofinter-rack traffic generated by each tenantrsquos chain Interest-ingly tenants cause a high percent of the data centerrsquos net-work to be utilized with theawareplacement and load distri-bution This is because when both network aware placementand load distribution are used tenants are able to scale outmore and more closely match their demand thereby pushingmore bytes out into the data center network One the wholethe data center infrastructure is more effectively utilized

84 Summary of Key ResultsOur key findings are that

12

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 13: Stratos: A Network-Aware Orchestration Layer for

Amount of Interminusrack Traffic (in MB)0 100 200 300 400 500

Fra

ctio

n of

tena

nts

0

02

04

06

08

1RandUniformRandAwareAwareUniformAwareAware

Figure 17Inter-rack tenant traffic

bull Stratos helps optimally meet application demand byaccurately identifying and addressing bottlenecks Incontrast network-agnostic approaches use up to 2X asmany MBs as Stratos yet they have severely back-logged request queues

bull All three network-aware components of Stratos are cru-cial to extracting the ideal overall benefits of Stratos

bull Even without intrinsic support for placement Stratoscan elastically meet the demands of applications in EC2Stratosrsquos fine-grained load distribution plays a crucialrole in sustaining application performance despite chang-ing network conditions

9 DISCUSSION

Integration of Stratos with MBs Stratos can be improvedoverall by having it be aware of MB functions For exampleif Stratos knows the duplication patterns in specific trafficflows then it can use this to more carefully decide whichflows to send to specific replicas of a redundancy eliminationMB MBs can benefit from knowing about Stratos too ega server load balancer can use the network load distributionpatterns imposed by Stratos together with server load indeciding how to balance requests across servers

Failure ResilienceOur placement hueristics are performance-centered and hence they impose rack-aware allocations How-ever this may not be desirable for tenants who want theirdeployments to be highly available Our placement heuris-tics can be adapted for such tenants to distribute VMs acrossracks for availability reasons while also minimizing net-work footprint The simplest extension is to modify the mapof available VM slots such that there is at most one slot avail-able per machine or one per rack for a given tenant

Zero Downtime As mentioned in Section 3 when a col-lection VMs are ready to be migrated re-placement maybe invoked across several tenant deployments (even thosewho VMs are not among the set being migrated) to findnew globally-optimal allocations There is a concern thatthis may impose down-time on tenant deployments becausetheir active traffic flows may either have to be suspended orthey may be lost in the transition To minimize such networkdowntime we can leverage support mechanisms available toclouds today eg VMWarersquos VDirector that tunnels pack-ets to the VMsrsquo old locations to be either buffered temorarilyor forwarded along to the new locations (when the VMs areready to receive traffic but before network routing changes

have kicked in)

10 RELATED WORK

Networked Services in the CloudRecent proposals [9 514 19] and third party middleware [14] have begun to incor-porate limited support for middleboxes CloudNaaS [19]CloudSwitch [5] and VPNCubed [14] aim to provide flex-ible composition of virtual topologies however they donrsquothave the mechanisms for scaling of networked services Em-brane [9] uses a proprietary framework that allows for theflexible scaling of networked services However it is limitedto provider-offered middleboxes and does not allow com-posing them with each other or with third-party MBs

Studies have looked at the properties of clouds that impactapplication performance [37 26] and that affect applicationreliability [36] Others have sought to enrich the networkinglayer of the cloud by adding frameworks that provide controlover bandwidth [17 23] security [20 29] and performanceof virtual migration [38] These are largely complementaryto Stratos

SplitMerge explores techniques that allow control overMB state so that MBs can be scaled up or down for elasticexecution [30] However they do not consider MB composi-tion the issue of what triggers scaling and how to managethe network interactions of the MBs during and after scalingwhich form the focus of our work That said SplitMergeand Stratos are complimentary to each other

Middleboxes in Enterprises and Datacenters Issues indeployment and management of middleboxes have been ex-amined in the context of enterprise [33] and data-center [24]networks But the focus is on composition in physical in-frastructures and thus performance challenges introducedbythe lack of tight control in clouds are not addressed

VM Placement Oversubscription within current data cen-ter networks and its impact on application performance andlink utilizations have been widely studied [37 26 18] Re-cent works [19 28] have explored using VM placement as asolution to this problem In comparison with prior schemeswhich focuses on placing individual VMs in isolation wefocus on discovering groups of related VMs with dense com-munication patterns and colocating them

ScalingRecent studies have considered the problem of scal-ing the number of virtual machines in each tier of a tenantrsquoshierarchy [34 2 11] All of them rely on CPU utilizationwhich we have shown to be insufficient

11 CONCLUSIONSEnhancing application deployments in todayrsquos clouds us-

ing virtual middleboxes is challenging due to the lack of net-work control and the inherent difficulty in intelligently scal-ing middleboxes while taking network effects into accountOvercoming the challenges in a systematic way requires anew ground-up framework that explicitly manages the net-work configuration and network interactions of MBs To this

13

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References
Page 14: Stratos: A Network-Aware Orchestration Layer for

end we presented the design implementation and evalua-tion of a network-aware orchestration layer for MBs calledStratos Stratos allows tenants to specific complex deploy-ments using a simple logical topology abstraction Then thekey components of Stratosndash an application-aware scheme forscaling rack-aware placement and network-aware flow dis-tribution ndash work in concert to carefully manage network re-sources at various time scales while elastically scaling en-tire tenant MB deployments to meet application demandsWe conduct a thorough evaluation using a testbed deploy-ment based on EC2 and large scale simulations to show thatStratos helps tenants make more efficient scaling decisionsthat all three network-aware components of Stratos are es-sential tenant applications make more effective us of MBsand providersrsquo infrastructures are more effectively used

12 REFERENCES[1] 2012 Cloud Networking Reporthttpwebtorialscom

content2012112012-cloud-networking-reporthtml[2] Amazon web serviceshttpawsamazoncom[3] Aryaka WAN Optimizationhttpwwwaryakacom[4] AWS Auto Scaling

httpawsamazoncomautoscaling[5] CloudSwitchhttpwwwcloudswitchcom[6] Floodlight openflow controller

httpfloodlightopenflowhuborg[7] httperfhttphplhpcomresearchlinuxhttperf[8] Open vSwitchhttpopenvswitchorg[9] Powering virtual network serviceshttpembranecom

[10] Rackspace cloudhttprackspacecomcloud[11] Right Scalehttpwwwrightscalecom[12] Silverpeak wan optimizationhttpcomputerworldcom

sarticle9217298Silver_Peak_unveils_multi_gigabit_WAN_optimization_appliance

[13] Suricatahttpopeninfosecfoundationorg[14] VPN-Cubedhttpcohesiveftcomvpncubed[15] Xenhttpxenorg[16] A Anand C Muthukrishnan A Akella and R Ramjee

Redundancy in network traffic Findings and implications InSIGMETRICS 2009

[17] H Ballani P Costa T Karagiannis and A Rowstron TowardsPredictable Datacenter Networks InSIGCOMM 2011

[18] T Benson A Akella and D A Maltz Network trafficcharacteristics of data centers in the wild InIMC 2010

[19] T Benson A Akella A Shaikh and S Sahu CloudNaaSA cloudnetworking platform for enterprise applications InSoCC 2011

[20] C Dixon H Uppal V Brajkovic D Brandon T Anderson andA Krishnamurthy ETTM a scalable fault tolerant network managerIn NSDI rsquo11

[21] A Ghodsi V Sekar M Zaharia and I Stoica Multi-ResourceScheduling for Packet Processing InProc SIGCOMM 2012

[22] G Gibb H Zeng and N McKeown Outsourcing networkfunctionality InHotSDN 2012

[23] C Guo G Lu H J Wang S Yang C Kong P Sun W Wu andY Zhang Secondnet a data center network virtualizationarchitecture with bandwidth guarantees InCo-NEXT rsquo10

[24] D A Joseph A Tavakoli and I Stoica A policy-awareswitchinglayer for data centers InSIGCOMM 2008

[25] E Kohler R Morris B Chen J Jannotti and M F Kaashoek TheClick modular routerTOCS 18263ndash297 2000

[26] A Li X Yang and S K M Zhang Cloudcmp Comparing publiccloud providers InIMC rsquo10 Melborne Australia 2010

[27] N McKeown T Anderson H Balakrishnan G ParulkarL Peterson J Rexford S Shenker and J Turner OpenFlowEnabling innovation in campus networksACM SIGCOMMComputer Communication Review 38(2)69ndash74 2008

[28] X Meng V Pappas and L Zhang Improving the scalability of datacenter networks with traffic-aware virtual machine placement InINFOCOM 2010

[29] L Popa M Yu S Y Ko S Ratnasamy and I Stoica CloudPolicetaking access control out of the network InHotNets rsquo10

[30] S Rajagopalan D Williams H Jamjoom and A WarfieldSplitmerge System support for elastic execution in virtualmiddleboxes InNSDI 2013

[31] V Ribeiro R Riedi R Baraniuk J Navratil and L Cottrellpathchirp Efficient available bandwidth estimation for networkpaths InPassive and Active Measurement Workshop 2003

[32] T Ristenpart E Tromer H Shacham and S Savage Hey you getoff of my cloud exploring information leakage in third-partycompute clouds InCCS 2009

[33] V Sekar S Ratnasamy M K Reiter N Egi and G Shi Themiddlebox manifesto enabling innovation in middlebox deploymentIn HotNets 2011

[34] Z Shen S Subbiah X Gu and J Wilkes Cloudscale elasticresource scaling for multi-tenant cloud systems InSoCC 2011

[35] J Sherry S Hasan C Scott A Krishnamurthy S Ratnasamy andV Sekar Making middleboxes someone elsersquos problem Networkprocessing as a cloud service InProc SIGCOMM 2012

[36] K V Vishwanath and N Nagappan Characterizing cloudcomputinghardware reliability InSoCC 2010

[37] G Wang and T S E Ng The impact of virtualization on networkperformance in Amazon EC2 data center InINFOCOM 2010

[38] T Wood K K Ramakrishnan P Shenoy and J van der MerweCloudNet dynamic pooling of cloud resources by live WANmigration of virtual machines InVEE rsquo11

14

  • 1 Introduction
  • 2 Background
  • 3 Stratos Overview
    • 31 Stratos tenant interface
    • 32 Stratos internals
    • 33 Interacting with other Provider Functions
      • 4 Elastic Scaling
        • 41 Strawman approaches
        • 42 Application-Aware Scaling Heuristic
          • 5 Rack-aware Placement
            • 51 Initial Placement
            • 52 Placing New Middlebox Instances
              • 6 Network-Aware Flow Distribution
              • 7 Implementation
              • 8 Evaluation
                • 81 Controlled Testbed Experiments
                • 82 (Restricted) Stratos in a Dynamic Scenario
                • 83 Simulations Stratos at Scale
                • 84 Summary of Key Results
                  • 9 Discussion
                  • 10 Related Work
                  • 11 Conclusions
                  • 12 References