containing chaos

©Nemertes Research 2011 ± www.nemertes.com ± 888-241-2685 ±DN1420

1

Containing Chaos: How Networks Must Reduce Complexity to Adapt to the Demands of Next-Generation Data Centers

By Johna Till Johnson President & Sr. Founding Partner, Nemertes Research

Executive Summary Data centers’ network requirements are changing dramatically, driven by

new applications, ongoing data-center consolidation, and the increasing workload dynamism and volatility introduced by virtualization. At the same time, traditional requirements for high speed, low latency, and high reliability continue to ratchet inexorably upwards. In response to these demands, data- center architects need to consider designs that limit complexity and reduce the possibility of chaotic behavior.

The Issue Data-center consolidation, server virtualization, and an increase in real-

time, high-bandwidth applications (such as video) and performance-sensitive applications (such as Voice over IP and desktop virtualization) are driving a paradigm shift in network architecture. A few years back, servers and users shared the same campus network, had similar requirements, and therefore relied on the same networking technologies. These days, servers are increasingly virtualized and consolidated in data centers. Users, in contrast, are distributed out across branches and administrative offices.

In other words, yesterday’s one-size-fits-all campus LAN has bifurcated into two LANs: An access network that primarily interconnects users, and a data-center network that primarily interconnects virtualized servers and connected storage.

Servers have very different network usage characteristics than users. Typically they require orders-of-magnitude greater bandwidths coupled with very low latency. That means data-center networks are under intense pressure to scale up performance and reduce latency, and scale out to handle increased interconnections and bandwidth.


2

And that’s not all. Virtualization introduces two new challenges: dynamism and complexity. With virtualization, it’s no longer possible to predict which workloads need to communicate with which users—or more importantly, which other workloads, or where either end of the conversation will be located. It used to be possible to engineer a network based on the expected traffic flows, giving well-traveled paths (either between users and an application, or between applications) higher bandwidth and lower latency. Now the physical location of the virtualized workload is unknown, and in fact usually varies with time. IT staffs can launch applications anywhere in the data center, and those applications even can jump to other data centers. This any-to-any behavior doesn’t work well across a traditional hierarchical data-center network.

The second challenge virtualization introduces is increased complexity—both operational and architectural. If you think of each virtualized workload as an end-node, the number of end-nodes connected through each network device goes up by at least an order of magnitude in a virtualized environment versus a physical one. And complexity increases geometrically with scale, which means a network designed to handle virtualized workloads isn’t 10 times more complex than one handling physical ones, but closer to 50-100 times. And an increase in complexity translates directly into an increase in management overhead (including costs)—and a decrease in reliability. Complexity also limits agility; particularly in a virtualized environment in which the goal is to build dynamic pools of compute resources. A complex network is harder to modify rapidly—meaning that the network gets in the way of rapidly provisioning resources.

The challenge facing data-center architects, therefore, lies in designing a network that scales performance while simultaneously reducing complexity.

Key Technology Trends and Business Drivers To better understand these challenging data-center requirements, it helps to

take a closer look at some of the critical technology trends and business drivers that produced them.

Data-Center Consolidation First and foremost is data-center consolidation: Over the past few years, IT

organizations have increasingly consolidated from dozens of data centers down to a handful (typically three), with the goal of optimizing costs by reducing real-estate footprint. This means that the remaining data centers are housing an order of magnitude more computing and storage resources—and that networks need to scale accordingly. That’s even before the added impact of server virtualization (see below).

Server Virtualization As noted, server virtualization is also a critical trend. Nearly every

organization (97%) has adopted some degree of server virtualization. (Please see Figure 1.) For organizations that have fully deployed virtualization, 78% of


3

workloads are virtualized. But most companies are still in the process of virtualizing: Just 68% of workloads, on average, are fully virtualized, meaning this is a trend that’s ongoing. And as noted, virtualization injects specific challenges into an architecture, increasing performance requirements and complexity by an order of magnitude or more.

Bandwidth Increases As all this is going on, bandwidth requirements are increasing dramatically,

driven by increases in application density and type. 10 Gbit Ethernet has become the de-facto standard in data-center networks. (Please see Figure 2). And major router manufacturers have announced 100-Gbit interfaces. The bottom line? Get ready for yet another step-function increase in data-center bandwidth.

Figure 1: Server Virtualization Adoption

Emerging Real-time Applications Along with the structural changes in the data center, IT organizations are

coping with a dramatic influx in real-time applications. These include growing use of video, both conferencing and streaming video. (Approximately 74% of companies are deploying, planning to deploy, or evaluating streaming video).


4

Another major application is desktop virtualization, deployed by 51% of companies in 2010 and projected to rise to 74% of companies by 2012.

These applications drive the need for both high bandwidth and extremely low latency in the data center core, since the servers for these applications are increasingly instantiated as virtual machines located in the data center.

Figure 2: Growth in 10-G Ethernet

The Impact of Technology Trends on Network Design

Overall, the impact of these technology trends is to shift the fundamental job of the data center. With virtualization, the major challenge of data-center networking is to provide an interconnection capability across which administrators can create virtual machines and manage them dynamically.

These virtual machines have two main problematic characteristics. First, they’re dynamic: They appear and disappear unpredictably as servers and applications are provisioned (increasingly, by the users themselves), and they move. Second, they increasingly generate traffic flows that are device-to-device, rather than client-to-server. “We’re seeing a 20% increase in any-to-any traffic in the data center,” says the CIO of a midsize university who notes that a driving factor is the increased use of video streaming applications. Users (including but not


5

limited to students) often store videos on one server, then move them server-to-server to process them.

Yet another trend is the emergence of SOA, Web 2.0, and collaboration applications, which also require real-time performance, and also drive server-to-server traffic flows.

In other words, data-center traffic flows are changing from statically defined, top-down (client-to-server) towards dynamic server-to-server—and the data- center architecture must change along with them.

At the same time, performance and reliability requirements continue to scale up. As the sheer volume of data-center traffic increases, due to data-center consolidation, and the emergence of high-bandwidth applications, capacity is also a design factor. And applications are increasingly intolerant of delay, making latency another design factor. Finally, since data centers increasingly are consolidated, failure is no longer an option, meaning that data-center networks need to get bigger, faster, and more able to handle unpredictable workloads while becoming even more reliable than previously.

The Complexity Challenge The challenge in re-architecting the data-center core is fundamentally this:

To support the design requirements of dynamic any-to-any traffic flows and high performance, while also reducing complexity. Why worry about complexity? In any large-scale system (such as a network), increasing complexity tends to do two things: Increase the cost of managing the system and decrease reliability. The catch is that to reduce complexity, one first has to understand and define it. Although there’s an entire science devoted to complexity theory, there’s no fixed definition of complexity, or of complex systems. A good definition of a complex system is the following: Complex systems are built out of a myriad of simple components which interact, and exhibit behavior that is not a simple consequence of pairwise interactions, but rather, emerges from the combination of interactions at some scale.

For networked systems, one can begin to think about complexity in terms of the number of devices or agents in the system and the potential interconnections between them. If there are N agents in a system, it takes N*(N-1)/2 interconnections to interlink these agents directly to each other, meaning that the number of interconnections scales geometrically with N.

In a data-center network, “agents” are switching and routing elements, and “connections” are the logical paths between them. Controlling complexity therefore involves minimizing the number of the interactions between agents, which, as we’ll see, is easier said than done.


6

Complexity, Chaos, and Dynamism One consequence of complexity is that it can generate chaotic behavior.

Mathematically speaking, chaotic behavior is behavior that’s neither predictable nor random—infinitesimally small changes in a starting state can produce arbitrarily large changes in a later state. Obviously this is undesirable in a system (or network) that’s designed to consistently deliver a specific function predictably and manageably. For example, in a networked environment, a minor difference in configuration could trigger a downstream failure that’s unpredictable and thus, unpreventable. These types of problems arise in virtually any complex environment (including nuclear power plants and airplanes in flight). Interestingly, chaotic behavior often arises from very simple relationships. In other words, a complex system that is constructed of simple, deterministic building blocks can nonetheless display chaotic behavior. (Surprisingly, this mathematical conception of chaos was accurately captured back in 1945 by the poet Edna St. Vincent Millay, who described it as, “Something simple yet not understood.”)

The challenge of reducing complexity therefore becomes, in essence, the challenge of containing chaos. Some approaches to doing so can be drawn from other fields in which chaotic behavior arises; others are specific to networks.

As noted, the complexity—and propensity for chaotic behavior—increases dramatically with a system’s “dynamism,” the need to change quickly from state to state. As Duncan Watts, applied mathematician and principal research scientist at Yahoo! Research, puts it more eloquently, dynamic behavior dramatically changes the game when it comes to chaos and complexity.

How does all this apply to data-center networks? In a nutshell, they need to be architected in a way that limits complexity and reduces, or eliminates, the possibility of chaotic behavior—particularly in light of the non-deterministic nature of virtualized servers and applications.

Architecting to Control Complexity As noted, controlling complexity in a network boils down to minimizing N,

the number of networking elements. With traditional architectures, that’s not exactly easy. Traditional architectures are built around a core-distribution-edge design (Please see Figure 3.) With such an architecture, connecting from a virtual workload (“V”) executing in server farm A to a virtual workload in server farm C requires traversal up and down the hierarchy (six hops). Similarly, a virtual workload in server farm B is three hops away from server farm C, and eight hops away from server farm A.

“Next to the mysteries of dynamics on a network…the problems of networks we have encountered up to now are just pebbles on the seashore.”—Duncan Watts, applied mathematician and principal research scientist, Yahoo! Research


7

This poses three challenges. First, to scale to support an increasing number of virtual workloads, this architecture must increase the network elements, exactly the opposite of the goal. Second, injecting multiple switching elements between virtual machines increases the path length, and therefore the latency, between virtual machines, which can adversely affect performance. Finally, a hierarchical network design is ill equipped to handle highly dynamic endpoints, such as virtual workloads.

Figure 3: Traditional Network Architecture

The solution is to “collapse the core,” or flatten the traditional hierarchical

structure as much as possible, ideally to provide a single hop between every site. That reduces complexity and also reduces latency. This means, for example, that if a user is processing a video-streaming application, he or she can dynamically provision a video-processing server across the data-center network from the video server with the confidence that the data-center network will inject no more than a single hop’s delay.

There’s an additional step that can reduce complexity even more dramatically. In other complex systems—aerospace engineering, for example—the solution to untrammeled complexity is to essentially create “black boxes” that provide a simple and predictable set of inputs and outputs to the rest of the system, thereby bounding the complexity (and curtailing potential chaos) That is, possible number of interactions between elements within the black box and the rest of the system is reduced, since the black box appears to the outside system as a single element.


8

In the case of network complexity, the corresponding approach is to reduce the entire network to a single, consistently managed switching element. In other words, replacing N independently managed switches with a single switch reduces inherent complexity as low as possible (N= 1). In some ways, this is analogous to grid computing, in which interconnected server farms can be managed and controlled as if they were a single server. Such approaches have succeeded in reducing cost and complexity at the same time they dramatically increase reliability (for example, in data centers in Google and other major Internet sites).

Figure 4: Flattening the Core

The Impact on Network Evolution Collapsing the core in data-center network design means data-center

routers and switches are no longer simply larger versions of the same network devices that exist in the rest of the campus network. Data-center networks are becoming larger and more centralized, scaling to the transport of terabit/s traffic, with nanosecond latencies. Additionally, these networks are seeing the convergence of Ethernet and Fibre Channel, interconnecting server and storage systems rather than users. Finally, they’re supporting an increased density of virtual workloads, which migrate dynamically across servers.

Access LANs (the networks in campus and branch offices) in contrast, increasingly are starting to feature centralized management, policy, and provisioning for a high density of (relatively) low-bandwidth users; supporting the convergence of wired and wireless infrastructure; interconnecting users (rather than servers and storage); and supporting user (rather than server) density and dynamism. (Please see Figure 5.)


9

The bottom line, again, is that today’s one-size-fits-all network architecture is segmenting into multiple, specialized networks.

Figure 5: Network Evolution

Conclusions and Recommendations Given the tectonic shifts in requirements, IT managers should be thinking

differently about how to architect data-center networks. Thanks to consolidation, data centers are handling more traffic, and applications like video and desktop virtualization are driving bandwidth and latency performance requirements. Finally, with the widespread advent of server virtualization, data centers today are increasingly interconnecting a mobile, dynamic population of virtual workloads. These changes collectively are driving the need for an architecture that delivers performance and scalability without increasing cost or complexity, which means, in turn, that IT managers should think in terms of bounding complexity and reducing the potential for chaotic behavior.

That means reassessing the traditional core, distribution, and edge architecture with an eye to minimizing switch count and maximizing the ability to manage multiple network components as a whole. Data-center network architects should take the following steps:

± Recognize that application volatility and dynamism is just beginning. As more and more data centers begin supporting autoprovisioning, virtual machines will start popping up all over the data center (and disappearing just as quickly).


10

± Plan for any-to-any. Even if the predominant application flows today are client-to-server, anticipate a rapid rise in peer-to-peer traffic in the data center.

± As much as possible, collapse the core (flatten the number of networking tiers). Even a partial “collapse” is better than none. If the current environment has the canonical three tiers—strive to reduce them to two.

± Seek an integrated approach to managing switching elements. Managing and provisioning routers and switches as a single entity reduces complexity (and the possibility for chaotic behavior).

About Nemertes Research: Nemertes Research is a research-advisory

firm that specializes in analyzing and quantifying the business value of emerging technologies. You can learn more about Nemertes Research at our Website, www.nemertes.com, or contact us directly at [email protected].

I shall put Chaos into fourteen lines And keep him there; and let him thence escape

If he be lucky; let him twist, and ape Flood, fire, and demon—his adroit designs Will strain to nothing in the strict confines Of this sweet order, where, in pious rape I hold his essence and amorphous shape, Till he with order mingles and combines.

Past are the hours, the years of our duress, His arrogance, our awful servitude:

I have him. He is nothing more nor less Than something simple not yet understood;

I shall not even force him to confess; Or answer. I will only make him good.

--Edna St. Vincent Millay, Mine the Harvest, a collection of new poems

jpaez

Typewritten Text

2000388-001 Feb 2011