deepconf: automating data center network topologies ... · network topologies using deep learning...

8
DeepConf: Automating Data Center Network Topologies Management with Machine Learning Christopher Streiffer * , Huan Chen *† , Theophilus Benson + , Asim Kadav Duke University * UESTC, China Brown University + NEC Labs ABSTRACT In recent years, many techniques have been developed to improve the performance and efficiency of data cen- ter networks. While these techniques provide high accu- racy, they are often designed using heuristics that lever- age domain-specific properties of the workload or hard- ware. In this vision paper, we argue that many data center networking techniques, e.g., routing, topology augmen- tation, energy savings, with diverse goals actually share design and architectural similarity. We present a design for developing general intermediate representations of network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework, DeepConf, that simplifies the processing of configuring and training deep learning agents that use the intermediate representation to learns different tasks. To illustrate the strength of our approach, we con- figured, implemented, and evaluated a DeepConf-agent that tackles the data center topology augmentation prob- lem. Our initial results are promising — DeepConf per- forms comparably to the optimal. 1. INTRODUCTION Data center networks (DCN) are a crucial and impor- tant part of the Internet’s ecosystem. The performance of these DCNs impact a wide variety of services from web browsing and videos to Internet of Things. The poor performance of these DCNs can result in as much as $4 million in lost revenue [1]. Motivated by the importance of these networks, the networking community has explored techniques for im- proving and managing the performance of the data cen- ter network topology by: (1) designing better routing or traffic engineering algorithms [6, 8, 13, 10], (2) improv- ing performance of a fixed topology by adding a limited number of flexible links [33, 11, 17, 16], and (3) remov- ing corrupted and underutilized links from the topology to save energy and improve performance [18, 14, 35]. Regardless of the approach, these topology-oriented techniques have three things in common: (1) Each is formalized as an optimization problem. (2) Due to the impracticality of scalably solving these optimizations, greedy heuristics are employed to create approximate solutions. (3) Each heuristic is intricately tied to the ap- plication patterns and does not generalize across novel patterns. Existing domain-specific heuristics provide sub- optimal performance and are often limited to specific scenarios. Thus as a community, we are forced to revisit and redesign these heuristics when the application pat- tern or network details changes – even a minor change. For example, while c-through [33] and FireFly [17] solve broadly identical problems, they leverage different heuris- tics to account for low-level differences. In this paper, we articulate our vision for replacing domain-specific rule-based heuristics for topology man- agement with a more general machine learning-based (ML) model that quickly learns optimal solutions to a class of problems while adapting to changes in the ap- plication patterns, network dynamics, and low-level net- work details. Unlike recent attempts that employ ML to learn point solutions, e.g., cluster scheduling [23] or routing [9], in this paper, we present a general frame- work, called DeepConf, that simplifies the process of designing ML models for a broad range of DCN topol- ogy problems and eliminates the challenges associated with efficiently training new models. The key challenges in designing DeepConf are: (1) tackling the dichotomy that exists between deep learn- ing’s requirements for large amounts of supervised data and the unavailability of these required datasets, and (2) designing a general, but highly accurate, deep learning model that efficiently generalizes to learning a broad ar- ray of data center problems ranging from topology man- agement and routing to energy savings. The key insight underlying DeepConf is that interme- diate features generated from the parallel convolutional layers using network data, e.g., traffic matrix, allows us to generate an intermediate representation of the net- work’s state that enables learning a broad range of data center problems. Moreover, while labeled production data crucial for machine learning is unavailable, empir- arXiv:1712.03890v1 [cs.NI] 11 Dec 2017

Upload: others

Post on 03-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepConf: Automating Data Center Network Topologies ... · network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework,

DeepConf: Automating Data Center NetworkTopologies Management with Machine Learning

Christopher Streiffer∗, Huan Chen∗†, Theophilus Benson+, Asim Kadav‡

Duke University∗ UESTC, China† Brown University+ NEC Labs‡

ABSTRACTIn recent years, many techniques have been developedto improve the performance and efficiency of data cen-ter networks. While these techniques provide high accu-racy, they are often designed using heuristics that lever-age domain-specific properties of the workload or hard-ware.

In this vision paper, we argue that many data centernetworking techniques, e.g., routing, topology augmen-tation, energy savings, with diverse goals actually sharedesign and architectural similarity. We present a designfor developing general intermediate representations ofnetwork topologies using deep learning that is amenableto solving classes of data center problems. We developa framework, DeepConf, that simplifies the processingof configuring and training deep learning agents thatuse the intermediate representation to learns differenttasks. To illustrate the strength of our approach, we con-figured, implemented, and evaluated a DeepConf-agentthat tackles the data center topology augmentation prob-lem. Our initial results are promising — DeepConf per-forms comparably to the optimal.

1. INTRODUCTIONData center networks (DCN) are a crucial and impor-

tant part of the Internet’s ecosystem. The performanceof these DCNs impact a wide variety of services fromweb browsing and videos to Internet of Things. Thepoor performance of these DCNs can result in as muchas $4 million in lost revenue [1].

Motivated by the importance of these networks, thenetworking community has explored techniques for im-proving and managing the performance of the data cen-ter network topology by: (1) designing better routing ortraffic engineering algorithms [6, 8, 13, 10], (2) improv-ing performance of a fixed topology by adding a limitednumber of flexible links [33, 11, 17, 16], and (3) remov-ing corrupted and underutilized links from the topologyto save energy and improve performance [18, 14, 35].

Regardless of the approach, these topology-orientedtechniques have three things in common: (1) Each is

formalized as an optimization problem. (2) Due to theimpracticality of scalably solving these optimizations,greedy heuristics are employed to create approximatesolutions. (3) Each heuristic is intricately tied to the ap-plication patterns and does not generalize across novelpatterns. Existing domain-specific heuristics provide sub-optimal performance and are often limited to specificscenarios. Thus as a community, we are forced to revisitand redesign these heuristics when the application pat-tern or network details changes – even a minor change.For example, while c-through [33] and FireFly [17] solvebroadly identical problems, they leverage different heuris-tics to account for low-level differences.

In this paper, we articulate our vision for replacingdomain-specific rule-based heuristics for topology man-agement with a more general machine learning-based(ML) model that quickly learns optimal solutions to aclass of problems while adapting to changes in the ap-plication patterns, network dynamics, and low-level net-work details. Unlike recent attempts that employ MLto learn point solutions, e.g., cluster scheduling [23] orrouting [9], in this paper, we present a general frame-work, called DeepConf, that simplifies the process ofdesigning ML models for a broad range of DCN topol-ogy problems and eliminates the challenges associatedwith efficiently training new models.

The key challenges in designing DeepConf are: (1)tackling the dichotomy that exists between deep learn-ing’s requirements for large amounts of supervised dataand the unavailability of these required datasets, and (2)designing a general, but highly accurate, deep learningmodel that efficiently generalizes to learning a broad ar-ray of data center problems ranging from topology man-agement and routing to energy savings.

The key insight underlying DeepConf is that interme-diate features generated from the parallel convolutionallayers using network data, e.g., traffic matrix, allows usto generate an intermediate representation of the net-work’s state that enables learning a broad range of datacenter problems. Moreover, while labeled productiondata crucial for machine learning is unavailable, empir-

arX

iv:1

712.

0389

0v1

[cs

.NI]

11

Dec

201

7

Page 2: DeepConf: Automating Data Center Network Topologies ... · network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework,

ical studies [21] show that modern data center traffic ishigly predictable and thus amenable to offline learningwith network simulators and historical traces.

DeepConf builds on this insight by using reinforce-ment learning (RL), an unsupervised technique, that learnsthrough experience and makes no assumptions on howthe network works. Rather, they are trained through theuse of a reward signal which “guides” them towards anoptimal solution and thus do not require real world data,and, instead, they can be trained using simulators.

The DeepConf framework provides a predefined RLmodel with the intermediate representation, a specificdesign for configuring this model to address differentproblems, an optimized simulator to enable efficient learn-ing, and an SDN-based platform for capturing networkdata and reconfiguring the network.

In this paper, we make the following contributions:

• We present a novel RL-based SDN architecture fordeveloping and training deep ML models for a broadrange of DCN tasks.

• We design a novel input feature extraction for DCNsfor developing different ML models over this inter-mediate representation of network state.

• We implemented a DeepConf-agent tackling the topol-ogy augmentation problem and evaluated it on repre-sentative topologies [15, 4] and traces [3], showingthat our approach performs comparable to optimal.

2. RELATED WORKOur work is motivated by the recent success of apply-

ing machine learning and RL algorithms to computergames and robotic planning [25, 31, 32]. The mostclosely related work [9] applies RL to packet routing.Unlike [9], DeepConf tackles the topology augmenta-tion problem and explores the use of deep networks asfunction approximators for RL. Existing applications ofmachine learning to data centers focus on improvingcluster scheduling [23] and more recently by Google tooptimize Power Usage Effectiveness(PUE) [2]. In thisvision paper, we take a different stance and focus onidentifying a class of equivalent data center managementoperations, namely topology management and configu-ration, that are amenable to a common machine learningapproach and design a modular system that enables dif-ferent agents to interoperate over a network.

3. BACKGROUNDThis section provides an overview of data center net-

working challenges and solutions and provides back-ground on our reinforcement learning methods.

Ethernet Fabric

Optical Switch

Optical Circuit

Figure 1: One optical switch with k=4 Fat Tree Topology.

3.1 Data Center NetworksData center networks introduce a range of challenges

from topology design and routing algorithms to VM place-ment and energy saving techniques.

Data centers support a large variety of workloads andapplications with time-varying bandwidth requirements.This variance in bandwidth requirements leads to hotspotsat varying locations and at different points in time. Tosupport these arbitrary bandwidth requirements, data cen-ter operators can employ non-blocking topologies; how-ever, non-blocking topologies are prohibitively costly.Instead, these operators employ a range of techniquesranging from hybrid architectures [33, 11, 17, 16], traf-fic engineering algorithms [6, 8, 13, 10], and energysaving techniques [18, 28]. Below, we describe thesetechniques and illustrate common designs.

Augmented Architectures: This class of approachesbuild on the intuition that at any given point in time,there are only a small number of hotspots (or congestedlinks). Thus, there is no need to build an expensivetopology that supports full bisection (eliminating all po-tential points of congestion). Instead, the existing topol-ogy can be augmented with a small number of linkswhich can be added ondemand and moved to the loca-tion of the hotspot or congested links. These approachesaugment the data center’s ethernet network with a smallnumber of optical [33, 11], wireless [16], or free op-tics [17]. 1 For example, Figure 1 shows a traditionalFat-Tree topology augmented with an optical switch –as was proposed by Helios [11].

These proposals argue for monitoring the traffic, usingan Integer Linear Program (ILP) or heuristic to detectthe hotspots and place the flexible links at the locationwith these hotspots. Unfortunately, moving these flexi-ble links incurs a large switching time during which thelinks are not operational. These intelligent and efficientalgorithms are developed to effectively detect hotspotsand efficiently place links.

Traffic Engineering: Orthogonal approaches [6, 8, 13,10] focus on routing. Instead of changing the topologies,these approaches change the mapping of flows to pathswithin a fixed topology. These proposals, also, arguefor monitoring traffic and detecting hotspots. Instead ofchanging topologies, these techniques move a subset of

1The number of augmented links is significantly smaller thanthe number of data center’s permanent links.

Page 3: DeepConf: Automating Data Center Network Topologies ... · network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework,

flows from congested links to un-congested links.

Energy Savings Data centers are notorious for their en-ergy usage [14]. To address this, researchers have pro-posed techniques to improve energy efficiency by de-tecting periods of low utilization and selectively turningoff links [18, 28]. These proposals argue for monitor-ing traffic, detecting low utilization, and powering-downlinks in portions of the data center with low utilizations.A key challenge with these techniques is to turn on thepowered-down links before demands rise.

Taking a step back, these techniques roughly followthe same design and operate in three steps (1) gathernetwork traffic matrix, (2) run an ILP to predict heavy(or low) usage, and (3) perform a specific action on asubset of the network. The actions range from augment-ing flexible links, turning off links, or moving traffic. Inall situations, the ILP does not scale to a large networkand a domain-specific heuristic is often used in its place.

3.2 Reinforcement LearningReinforcement learning (RL) algorithms learn through

experience with a goal towards maximizing rewards. Un-like supervised learning where algorithms train over la-bels, RL algorithms learn by interacting with an envi-ronment such as a game or a network simulator.

In traditional RL, an agent interacts with an environ-ment over a number of discrete time steps. Hence, ateach time step t, the agent in a world observes a state stin order to select an action at from a possible action setA. The agent is guided by a policy, π, which is a func-tion that maps state st to actions at. The agent receivesa reward rt for each action and transitions to the nextstate st+1. The goal of the agent is maximizing the totalreward. This process continues until the agent reaches afinal state or time limit, after which the environment isreset and a new training episode is played. After a num-ber of training episodes, the agent learns to pick actionsthat maximize the rewards and can learn to handle unex-pected states. RL is effective and has been successfullyused to model robotics, game bots, etc.

The goal of commonly used policy-based RL is tofind a policy, π, that maximizes the cumulative rewardand converges to a theoretical optimal policy. In deeppolicy-based methods, a neural network computes a pol-icy distribution π(at|st; θ), where θ represents the set ofparameters of the function. Deep networks as functionapproximators is a recent development and other learn-ing methods can be used. We now describe the RE-INFORCE and actor-critic policy methods which rep-resent different methods to score the policy J(θ). RE-INFORCE methods [34] use gradient ascent on E[Rt],where Rt =

∑∞i=0 γ

irt+i is the accumulated rewardstarting from time step t and discounted at each step

by γ ∈ (0, 1], the discount factor. The REINFORCEmethod, which is the Monte-Carlo method, updates θusing the gradient∇θ log π(at|st; θ)Rt, which is an un-biased estimator of ∇θ E[Rt]. The value function iscomputed as V π(st) = E[Rt|st] which is the expectedreturn for following the policy π in state st. This methodprovides actions with high returns but suffers from high-variance of gradient estimates.

Asynchronous Advantage Actor Critic (A3C): A3C [24]improves REINFORCE performance by operating asyn-chronously and by using a deep network to approximatethe policy and value faction. A3C uses the actor-criticmethod which additionally computes a critic functionthat approximates the value function. A3C, as used byus, uses a network with two convolutional layers fol-lowed by a fully connected layer. Each hidden layeris followed by a nonlinearity function (ReLU). A soft-max layer which approximates the policy function anda linear layer to output an estimate of the value functionV (st; θ) together constitute the output of this network.Asynchronous gradient descent using multiple agents isused to train the network and this improves the trainingspeed. A central server (similar to a parameter server)coordinates the parallel agents – each agent calculatesthe gradients and sends the updates to the server after afixed number of steps, or when a final state is reached.Furthermore, following each update, the central serverpropagates new weights to the agents to achieve a con-sensus on the policy values. There is a cost function witheach deep network (policy and value). Using two lossfunctions has found to improve convergence and pro-duce better-regularized models. The policy cost func-tion is given as:fπ (θ) = log π (at|st; θ) (Rt − V (st; θt))+βH (π (st; θ))

(3.1)where θt represents the values of the parameters θ attime t, Rt =

∑k−1i=0 γ

irt+i + γkV (st+k; θt) is the es-timated discounted reward.H (π (st; θ)) is used to favorexploration and its strength is controlled by the factor β.

The cost function for the estimated value function is:fv (θ) = (Rt − V (st; θ))

2 (3.2)

Additionally, we augment our A3C model to learncurrent states apart from accumulating rewards for goodconfigurations using GAE [30]. The deep network, thatreplaces the transition matrix as the function approxi-mator learns the value of the given state and the policyof the given state. The model uses GAE to compute thevalue of a given state that not only returns the rewardfor the model for the given policy decision but also re-wards the model for estimating the value of the state.This helps to guide the model to learn the states insteadof just maximizing rewards.

4. VISION

Page 4: DeepConf: Automating Data Center Network Topologies ... · network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework,

DeepConf Agent

DeepConf Abstraction Layer

SDN Controller

DeepConf Agent

DeepConf Agent……

NetworkSimulator

TrainingHarness

……

Network Topology

Figure 2: DeepConf Architecture

Our vision is to automate a subset of data center man-agement and operational tasks by leveraging DeepRL.At a high-level, we anticipate the existence of severalDeepRL agents, each trained for a specific set of taskse.g. traffic-engineering, energy-savings, or topology-augmentations. Each agent will run as an applicationatop an SDN controller. The use of SDN provides theagents with an interface for gathering their required net-work state and a mechanism for enforcing their actions.For example, DeepConf should be able to assemble thetraffic matrix by polling the different devices within thenetwork, compute a decision for how the optical switchshould be configured to best accommodate the currentnetwork load, and reconfigure the network.

At a high-level, DeepConf’s architecture consists ofthree components (Figure 2): the network simulator toenable offline training of the DeepRL agents, the Deep-Conf abstraction layer to facilitate communication be-tween the DeepRL agents and the network, and the DeepRLagents, called DeepConf-agents, which encapsulate datacenter functionality.

Applying learning: The primary challenges in applyingmachine learning to network problems are (1) the defi-ciency of training data pertaining to operator and net-work behavior and (2) the lack of models and loss func-tions that can accurately model the problem and gener-alize to unexpected situations. This shortage presentsa roadblock for using supervised-based approaches fortraining ML models. To address this issue, DeepConfuses RL where the model is trained by exploring differ-ent network states and environments generated by thesurplus of simulators available in the network commu-nity. Coupled with the wide availability of network jobtraces, this allows for DeepConf to learn a highly gener-alizable policy.

DeepConf Abstraction Layer: Today’s SDN controllersexpose a primitive interface with low-level information.The DeepConf applications will instead require high-level models and interfaces. For example, our agentswill require interfaces that provide control over pathsrather than over flow table entries. While emerging ap-proaches [19] argue for similar interfaces, these approachesdo not provide a sufficiently rich set of interfaces for

the broad range of agents we expect to support and donot provide composition primitives for safely combin-ing the output from the different agents. Moreover, ex-isting composition operators [27, 20, 12] assume thatthe different SDN applications (or DeepConf-agent inour case) are generating non-conflicting actions – hencethese operators can not tackling conflict actions. SDNComposition approaches [29, 7, 26] that do tackle con-flicting actions, require significant rewrite of the SD-NApp which we are unable to do because DeepConf-agent are rewritten within the DeepRL paradigm.

More concretely, we require high-layer SDN abstrac-tions that enable to RLAgents to more easily learn andact of the network. Additionally, we require novel com-position operators that can reason about and tackle con-flicting actions generated by RLAgents.

Domain-specific Simulators: (Low hanging fruit) Ex-isting SDN research leverages a popular emulation plat-form, Mininet, which fails to scale to large experiments.A key requirement for employing DeepRL is to have ef-ficient and scalable simulators that replays traces andenables learning from these traces. We extend flow-based simulators to model the various dimensions thatare required to train our models. To improve efficiency,we explore techniques that partition the simulation andenables reuse of results — in essence, to enable incre-mental simulations.

In addressing our high-level vision and developing so-lutions to the above challenges, there are several high-level goals that a production-scale system must address:(1) our techniques must generalize across topologies,traffic matrixes, and a range of operational settings, e.g.,link failures; (2) our techniques must be as accurate andefficient as existing state-of-the-art techniques; and (3)our solutions must incur low operational overheads, e.g.,minimizing optical switching time or TCAM utilization.

5. DESIGNIn this section, we provide a broad description of how

to define and structure existing data center network man-agement techniques as RL tasks, then describe the meth-ods for training the resulting DeepConf-agents.

5.1 DeepConf AgentsIn defining each new DeepConf-agent, there are four

main functions that a developer must specify: state space,action space, learning model, and reward function. Theaction space and reward are both specific to the man-agement task being performed and are, in turn, uniqueto the agent. Respectively, they express the set of actionsan agent can take during each step and the reward for theactions taken. The state space and learning models are

Page 5: DeepConf: Automating Data Center Network Topologies ... · network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework,

more general and can be shared and reused across differ-ent DeepConf-agents. This is because of the fundamen-tal similarities shared between the data center manage-ment problems, and because the agents are specificallydesigned for data centers.

In defining a DeepConf agent for the topology aug-mentation problem, we (1) define state-spaces specificto the topology, (2) design a reward function based onapplication level metrics, and (3) define actions that cor-relate to activating/de-activating links.

State Space: In general, the state space consists of twotypes of data – each reflecting the state of the networkat a given point in time. First, the general network statethat all DeepConf-agents require: the network’s trafficmatrix (TM) which contains information on the flowswhich will executed during the last t seconds of the sim-ulation.

Second, a DeepConf-agent specific state-space thatcaptures the impact of the actions on the network. Forexample, for the topology augmentation problem, thiswould be the network topology – note, the actions changethe topology. Whereas for the traffic engineering prob-lem, this would be a mapping of flows to paths – note,the actions change a flow’s routes.

Learning Model: Our learning model utilizes a Convo-lutional Neural Network (CNN) to compute policy de-cisions.

The exact model for a DeepConf-agent depends onthe number of state-spaces used as input. In general, themodel will have as many CNN-blocks as there are statespaces – one CNN-block for each state space. The out-put of these blocks are concatenated together and inputinto two fully connected layers, followed by a softmaxoutput layer. For example, for the topology-augmentationproblem, as observed in Figure 4, our DeepConf-agenthas two CNN blocks to operate on both the topologyand the TM states spaces in parallel. This allows for thelower CNN layers to perform feature extraction on theinput spatial data, and for the fully connected layers toassemble these features in a meaningful way.

5.2 Network TrainingTo train a DeepConf-agent, we run the agent against

a network simulator. The interaction between the simu-lator (described in Section 7) and RL agent can be de-scribed as follows (Figure 3): (1) The DeepConf agentreceives state si from the simulator at training step i.(2) The DeepConf-agent uses the state information tomake a policy decision about the network, and returnsthe selected actions to the simulator. For the DeepConf-agent for the topology augmentation problem, called theAugmentation-Agent, the actions are the links to ac-tivate and hence a new topology. (3) If the topology

DeepConf Agent

Network Topology

Flows

Control Interface

Simulator

(1) Read state

(2) Decision

(3) Add links

(4) Update route

(5) Get reward

Figure 3: DeepConf-agent model training.

changes, the simulator re-computes the paths for the ac-tive flows. (4) The simulator executes the flows for t sec-onds. (5) The simulator returns the reward ri and statesi+1 to the DeepConf-agent, and the process restarts.

Initialization During the initial training phase, we forcethe model to explore the environment by randomizingthe selection process — the probability that an action ispicked corresponds to the value of the index which rep-resents the action. For instance, with the Augmentation-Agent, link i has probability wi of being selected. Asthe model becomes more familiar with the states andcorresponding values, the model will better formulateits policy decision. At this point, the model will asso-ciate a higher probability with the links it believes tohave a higher reward, which will cause these links to beselected at a higher frequency. This methodology allowsfor the model to reinforce its decisions about links, whilethe randomization helps the model avoid local-minima.

Learning Optimization: To improve the efficiencyof learning, the RL agent maintains a log containing thestate, policy decision, and corresponding reward. TheRL agent performs experience replay after n simula-tion steps. During replay, the log is unrolled to com-pute the policy loss across the specified number of stepsusing Equation 3.1. The agent is trained using Adamstochastic optimization [22], with an initial learning rateof 10−4 and a learning rate decay factor of 0.95. Wefound that a smaller learning rate and low decay helpedthe model better explore the environment and form amore optimal policy.

6. USE CASE: AUGMENTATION AGENTMore formally defined, in the topology augmentation

problem the data center consists of a fixed hierarchicaltopology and an optical switch, which connects all thetop-of-rack switches. While the optical switch is phys-ically connected to all ToR switches, unfortunately, theoptical switch can only support a limited number of ac-tive links. Given this limitation, the rules for the topol-ogy problem are defined as:

• The model must select n links to activate at a givenstep during the simulation.

• The model receives a reward based on the link uti-lization and the flow duration.

Page 6: DeepConf: Automating Data Center Network Topologies ... · network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework,

Figure 4: The CNN model utilized by the RL agent.

• The model collects the reward on a per-link basisafter t seconds of simulation.

• All flows are routed using equal-cost multi-pathrouting (ECMP).

State Space: The agent specific state space is the net-work topology, which is represented by a sparse matrixwhere entries within the cells correspond to active linkswithin the network.Action Space: The RL agent interacts with the environ-ment by adding links between edge switches. The ac-tion space for the model, therefore, corresponds to thedifferent possible link combinations and is representedas an n dimensional vector. The values within the vec-tor correspond to a probability distribution, where wi isequal to the probability of link i being the optimal pickfor the given input state s. The model selects the highestn values from this distribution as the links that should beadded to the network topology.

Reward: The goal of the model can be summarized as:(1) Maximize link utilization and (ii) Minimize the av-erage flow-completion time.

With this in mind, we formulate our reward functionas:

R(Θ, s, i) =∑f∈F

∑l∈f

bftf

(6.1)

Where F represents all active and completed flows dur-ing the previous iteration step, l represents the links usedby flow f , bf represents the number of bytes transferredduring the step time, and tf represents the total dura-tion of the flow. The purpose of this reward function isto reward for high link utilization but penalize for longlasting flows. The design of this function has the effectof guiding the model towards completing large flowswithin a smaller period of time.

7. EVALUATIONIn this section, we analyze DeepConf under realistic

workload with representative topologies.

Experiment SetupWe evaluate DeepConf on a trace driven flow-level

0

2

4

6

8

10

12

FatTree VL2

Flow

Completion Time (s)

Topology

DeepConf Optimal

Figure 5: Median Flow Completion Time.simulator using a large scale map-reduce traces fromFacebook [3].

We evaluate two state-of-the-art clos-style data centertopologies: K=4 Fat-tree [5] and VL2 [15]. In our anal-ysis, we focus on flow completion time (FCT) a metricwhich captures the duration between the first and lastpacket of a flow. We augment both topologies by addingan optical switch with four links. Here we compareDeepConf against the optimal solution derived from alinear program — Note: this optimal solution can not besolved with larger topologies [8].Learning Results: The training results demonstratethat the RL agent learns to optimize its policy decision toincrease the total reward received across each episode.

We observed that the loss decreases as training in-creases, with the largest decrease occurs during the ini-tial training episodes, a result consistent with the learn-ing rate decay factor employed during training.Performance Results: The results, Figure 5, show thatDeepConf performs comparable with optimal [33, 11]across representative topologies and workloads. Thus,our system is able to learn a solution that’s close to theoptimal across a range of topologies.

Takeaway We believe these initial results are promis-ing, and that more work is required in order to under-stand and improve the performance of DeepConf.

8. DISCUSSIONWe now discuss open questions:Learning to generalize: In order to avoid over-fitting

to a specific topology, we train our agent over a largenumber of simulator configurations. DeepRL agents needto be trained and evaluated on many different platformsto avoid being overtly specific to few networks and cor-rectly handle unexpected scenarios. Solutions that em-ploy machine learning to address network problems us-ing simulators need to be cognizant of these issues whendeciding the training data.

Learning new reward functions: DeepRL methodsneed appropriate reward functions to ensure that theyoptimize for the correct goals. For some networks prob-lems like topology configuration this may be straight-forward. However, other problems like routing may re-quire a weighted combination of network parameters

Page 7: DeepConf: Automating Data Center Network Topologies ... · network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework,

that need to be correctly designed for the agent to op-erate the network correctly.

Learning other data center problems. In this pa-per, we focused on problems that center around learningto adjust the topology and routing. Yet, the space ofdata center problems is much larger. As part of ongoingwork, we are investigating intermediate representationsand models for capturing high-level tasks.

9. CONCLUSIONOur high level goal, is to develop ML-based systems

that replace existing heuristic-based approach to tack-ling data center networking challenges. This shift fromheuristics to ML, will enable us to design solutions thatcan adapt to changes in patterns by consumting data andrelearning – an automated task. In this paper, we takethe first steps towards acheving these goals by design-ing a reinforcement learning based framework, calledDeepConf, for automatically learning and implementinga range of data center networking techniques.

10. REFERENCES[1] Amazon found every 100ms of latency cost them 1% in sales.

https://blog.gigaspaces.com/amazon-found-every-100ms-of-latency-cost-them-1-in-sales/.

[2] DeepMind AI Reduces Google Data Centre Cooling Bill by40%. https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/.

[3] Statistical Workload Injector for MapReduce . https://github.com/SWIMProjectUCB/SWIM/wiki.

[4] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable,commodity data center network architecture. In Proceedings ofACM SIGCOMM 2008.

[5] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable,commodity data center network architecture. In Proceedings ofACM SIGCOMM 2008.

[6] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, andA. Vahdat. Hedera: Dynamic flow scheduling for data centernetworks. In Proceedings of USENIX NSDI 2010.

[7] A. AuYoung, Y. Ma, S. Banerjee, J. Lee, P. Sharma, Y. Turner,C. Liang, and J. C. Mogul. Democratic resolution of resourceconflicts between sdn control programs. In CoNext, 2014.

[8] T. Benson, A. An, A. Akella, and M. Zhang. Microte: The casefor fine-grained traffic engineering in data centers. InProceedings of ACM CoNEXT 2011.

[9] J. A. Boyan, M. L. Littman, et al. Packet routing in dynamicallychanging networks: A reinforcement learning approach.Advances in neural information processing systems, pages671–671, 1994.

[10] A. Das, C. Lumezanu, Y. Zhang, V. K. Singh, G. Jiang, andC. Yu. Transparent and flexible network management for bigdata processing in the cloud. In Proceedings of ACM HotCloud2013.

[11] N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz,V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat. Helios:A hybrid electrical/optical switch architecture for modular datacenters. In Proceedings of ACM SIGCOMM 2010.

[12] N. Foster, R. Harrison, M. J. Freedman, C. Monsanto,J. Rexford, A. Story, and D. Walker. Frenetic: A networkprogramming language. SIGPLAN Not., 46(9):279–291, Sept.2011.

[13] S. Ghorbani, B. Godfrey, Y. Ganjali, and A. Firoozshahian.

Micro load balancing in data centers with drill. In Proceedingsof ACM HotNets 2015.

[14] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel. The costof a cloud: Research problems in data center networks.SIGCOMM Comput. Commun. Rev., 39(1):68–73, Dec. 2008.

[15] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim,P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Vl2: Ascalable and flexible data center network. In Proceedings ofACM SIGCOMM 2009.

[16] D. Halperin, S. Kandula, J. Padhye, P. Bahl, and D. Wetherall.Augmenting data center networks with multi-gigabit wirelesslinks. In Proceedings of ACM SIGCOMM 2011.

[17] N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P.Longtin, H. Shah, and A. Tanwer. Firefly: A reconfigurablewireless data center fabric using free-space optics. InProceedings of ACM SIGCOMM 2014.

[18] B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis,P. Sharma, S. Banerjee, and N. McKeown. Elastictree: Savingenergy in data center networks. In Proceedings of USENIXNSDI 2010.

[19] V. Heorhiadi, M. K. Reiter, and V. Sekar. Simplifyingsoftware-defined network optimization using sol. InProceedings of USENIX NSDI 2016.

[20] X. Jin, J. Gossels, J. Rexford, and D. Walker. Covisor: Acompositional hypervisor for software-defined networks. InProceedings of the 12th USENIX Conference on NetworkedSystems Design and Implementation, NSDI’15, pages 87–101,Berkeley, CA, USA, 2015. USENIX Association.

[21] S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy,A. Tumanov, J. Yaniv, R. Mavlyutov, I. Goiri, S. Krishnan,J. Kulkarni, and S. Rao. Morpheus: Towards automated slos forenterprise clusters. In 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16), pages117–134, GA, 2016. USENIX Association.

[22] D. P. Kingma and J. Ba. Adam: A method for stochasticoptimization. CoRR, abs/1412.6980, 2014.

[23] H. Mao, M. Alizadeh, I. Menache, and S. Kandula. Resourcemanagement with deep reinforcement learning. In Proceedingsof the 15th ACM Workshop on Hot Topics in Networks, pages50–56. ACM, 2016.

[24] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap,T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronousmethods for deep reinforcement learning. In InternationalConference on Machine Learning, 2016.

[25] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,D. Wierstra, and M. Riedmiller. Playing atari with deepreinforcement learning. arXiv preprint arXiv:1312.5602, 2013.

[26] J. C. Mogul, A. AuYoung, S. Banerjee, L. Popa, J. Lee,J. Mudigonda, P. Sharma, and Y. Turner. Corybantic: Towardsthe modular composition of sdn control programs. In HotNets,2013.

[27] C. Monsanto, N. Foster, R. Harrison, and D. Walker. Acompiler and run-time system for network programminglanguages. In Proceedings of ACM POPL 2012.

[28] S. Nedevschi, L. Popa, G. Iannaccone, S. Ratnasamy, andD. Wetherall. Reducing network energy consumption viasleeping and rate-adaptation. In Proceedings of the 5th USENIXSymposium on Networked Systems Design and Implementation,NSDI’08, pages 323–336, Berkeley, CA, USA, 2008. USENIXAssociation.

[29] S. Prabhu, M. Dong, T. Meng, P. B. Godfrey, and M. Caesar.Let me rephrase that: Transparent optimization in sdns. InProceedings of ACM SOSR 2017.

[30] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel.High-dimensional continuous control using generalizedadvantage estimation. arXiv preprint arXiv:1506.02438, 2015.

[31] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. VanDen Driessche, J. Schrittwieser, I. Antonoglou,V. Panneershelvam, M. Lanctot, et al. Mastering the game of gowith deep neural networks and tree search. Nature,

Page 8: DeepConf: Automating Data Center Network Topologies ... · network topologies using deep learning that is amenable to solving classes of data center problems. We develop a framework,

529(7587):484–489, 2016.[32] N. Usunier, G. Synnaeve, Z. Lin, and S. Chintala. Episodic

exploration for deep deterministic policies: An application tostarcraft micromanagement tasks. arXiv preprintarXiv:1609.02993, 2016.

[33] G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki,T. Ng, M. Kozuch, and M. Ryan. c-through: Part-time optics indata centers. In Proceedings of ACM SIGCOMM 2010.

[34] R. J. Williams. Simple statistical gradient-following algorithmsfor connectionist reinforcement learning. Machine learning,8(3-4):229–256, 1992.

[35] D. Zhuo, M. M. Ghobadi, R. Mahajan, K.-T. Forster,A. Krishnamurthy, and T. Anderson. Understanding andmitigating packet corruption in data center networks. page 14.

ACM SIGCOMM, August 2017.