network provisioning in iaas clouds - tu/e · pdf filegrid computing community is...
TRANSCRIPT
Network provisioning in IaaS clouds
Theodorou, D.; Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. SoftwareTechnology (ST)
Published: 01/01/2012
Document VersionPublisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differencesbetween the submitted version and the official published version of record. People interested in the research are advised to contact theauthor for the final version of the publication, or visit the DOI to the publisher's website.• The final author version and the galley proof are versions of the publication after peer review.• The final published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
Citation for published version (APA):Theodorou, D., & Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology(ST) (2012). Network provisioning in IaaS clouds: a network resource management system Eindhoven:Technische Universiteit Eindhoven
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ?
Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.
Download date: 22. Apr. 2018
Network Provisioning in IaaS
Clouds
Dimitrios Theodorou September 2012
Network Provisioning in IaaS
Clouds
Dimitrios Theodorou
September 2012
Network Provisioning in IaaS Clouds A Network Resource Management System
Eindhoven University of Technology
Stan Ackermans Institute / Software Technology
Partners
Nikhef Eindhoven University of Technology
Steering Group J. Templon
J.J. Keijser
T. Suerink
R.H. Mak
Date September 2012
Contact
Address
Eindhoven University of Technology
Department of Mathematics and Computer Science
HG 6.57, P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands
+31402474334
Published by Eindhoven University of Technology
Stan Ackermans Institute
Printed by Eindhoven University of Technology
UniversiteitsDrukkerij
ISBN 978-90-444-1169-0
Abstract A network provisioning system for cloud infrastructures has been developed at Nikhef. The
system operates on the network topology of the cloud site and can create Virtual Networks
on top of it, by connecting network interfaces in isolated Ethernet segments. The system
can also extend Virtual Networks over data-center interconnections, or bridge them with
external networks (the Internet). This functionality is achieved by the system’s being able
to: 1) maintain a logical network topology graph and map connectivity requests onto it, 2)
configure network hardware, 3) negotiate cross-data-center connections with a duplicate of
itself running on the remote site, and 4) manage network isolation mechanisms (such as
802.1Q VLANs). The system was built mainly as a service to cloud platforms, and a plugin
that integrates it with OpenNebula was created for this purpose. The system also provides
administrative access that allows monitoring and modifying the topology and VLAN
allocation. Furthermore, 1) and 2) provide a foundation for further development to support
more advanced network configurations (e.g., QoS).
Keywords Cloud Computing, Infrastructure-as-a-Service, IaaS, Cloud platform, Network
Virtualization, Network Resources, Virtual Networks, VM-aware networking, Software-
defined Networking, SDN, OpenNebula, OpenStack, Data-center interconnection, Network
topology modeling, virtual switches, VLAN, 802.1Q, Inter-cloud, NRS, Netwerk Regel
Systeem, Network Resource Scheduler
Preferred
reference
Dimitrios Theodorou, Network Provisioning in IaaS Clouds: A Network Resource
Management System. Eindhoven University of Technology, SAI Technical Report,
September, 2012.
A catalogue record is available from the Eindhoven University of Technology Library.
ISBN: 978-90-444-1169-0 (Eindverslagen Stan Ackermans Instituut ; 2012/067)
Partnership This project was supported by Eindhoven University of Technology and Nikhef.
Disclaimer
Endorsement
Reference herein to any specific commercial products, process, or service by trade name,
trademark, manufacturer, or otherwise, does not necessarily constitute or imply its
endorsement, recommendation, or favoring by the Eindhoven University of Technology or
Nikhef. The views and opinions of authors expressed herein do not necessarily state or
reflect those of the Eindhoven University of Technology or Nikhef, and shall not be used
for advertising or product endorsement purposes.
Disclaimer
Liability
While every effort will be made to ensure that the information contained within this report
is accurate and up to date, Eindhoven University of Technology makes no warranty,
representation or undertaking whether expressed or implied, nor does it assume any legal
liability, whether direct or indirect, or responsibility for the accuracy, completeness, or
usefulness of any information.
Trademarks Product and company names mentioned herein may be trademarks and/or service marks of
their respective owners. We use these names without any particular endorsement or with
the intent to infringe the copyright of the respective owners.
Copyright Copyright © 2012. Eindhoven University of Technology. All rights reserved.
No part of the material protected by this copyright notice may be reproduced, modified, or
redistributed in any form or by any means, electronic or mechanical, including
photocopying, recording, or by any information storage or retrieval system, without the
prior written permission of the Eindhoven University of Technology and Nikhef.
i
Foreword The title of this report, “Network Provisioning in IaaS clouds”, may seem a bit dry at
first, but it does cover the subject very nicely. When we started this project, we posed
a research question on automatically provisioning networks in a cloud environment.
Little did we know that this topic would become one of the "hottest" in cloudspace
over the course of the project. Thus, even though the title may appear dry, the
contents of this report are, as of date, far from dry.
At a grid&cloud conference held in September 2012, it became very clear that
network provisioning is a major topic for computer science research and of great
business interest.
The cloud environment is a continually shifting landscape, with many players.
Dimitris managed this shifting landscape very well, and developed a network
provisioning platform that is scalable, pluggable and cloud-platform independent.
This ensures that his efforts remain usable over the next years. During the project we
have had many fruitful discussions about the architecture of the system, as well as
brief periods of mild panic as another new cloud provisioning tool appeared on the
internet, threatening to make Dimitris' work obsolete. But as Dimitris progressed
further and further it became clear that the NRS has unique features that no other
virtual network provisioning tool currently can provide.
The result of Dimitris' research is a working prototype of the network provisioning
tool that we originally envisaged. The only question remaining is whether this report
is the end of a project or rather the start of a new, interesting, and very exciting
future.
Jan Just Keijser
September 25, 2012
iii
Preface As of the writing of this report, Network Virtualization is a domain undergoing
innovation and development. It is a domain concerned with the networking aspect
of cloud computing, whose popularity has skyrocketed in the previous years. The
Grid computing community is investigating cloud computing services, adding
their own priorities in what they expect from network virtualization.
This report describes the design and implementation of a solution for network
provisioning in cloud infrastructures, which tries to combine open-source cloud
platforms with network topology management and hardware configuration. The
project was the final project for the Stan Ackerman’s Institute Software
Technology program, Eindhoven University of Technology, and was performed
on behalf of Nikhef, a Grid infrastructure provider.
This report is of technical nature and targets an audience familiar with object-
oriented software and networking. For an elaboration of the needs the system
fulfills, readers should refer to Chapters 2-3. The design and deployment of the
system is described in Chapters 4-6. Lastly, for results and conclusions, the reader
should refer to Chapter 8.
Dimitrios Theodorou
September 29, 2012
v
Acknowledgements Several people assisted me in the realization of this project. First and foremost, I
would like to thank Jan Just Keijser and Tristan Suerink, my supervisors at
Nikhef. They provided me with technical advice, participated in brainstorming
sessions, gave me access to Nikhef hardware, and in general made sure I was
provided anything I needed for the completion of the project. Tristan Suerink in
particular was the ‘brain’ behind the system’s initial concept.
I would also like to thank Rudolf Mak, my TU/e supervisor, for his precise and
constructive contributions in the design, presentation and documentation of the
project.
There are plenty of other people who made contributions to the project. I would
like to thank David Groep for helping with the formation of the system’s concept
and initial design, and Jeff Templon for helping me to quickly become familiar
with the Grid ecosystem and providing a nice working atmosphere at Nikhef.
I would also like to thank Mischa Sallé and Oscar Koeroo for their technical
guidance and contributions in matters of system deployment and development
throughout the duration of the project. Paco Bernabé also helped with deploying
and testing the system.
I also want to thank the OOTI program facilitators, Ad Aerts and Maggy de Wert
for doing their best to assist with any administrative and other practical issues that
occurred during the project as well as during the whole 2-year OOTI period.
Lastly, I would like to thank Hurng-Chun Lee for providing relaxing
conversational moments during office hours.
Dimitrios Theodorou
September 29, 2012
vii
Executive Summary Currently, several open-source cloud platforms implement IaaS Cloud services.
However, they provide Virtual Networks to their users using rudimental network
configurations that have the following limitations:
VLAN management is either not possible or not easy
Static VLAN configurations are provided which do not scale
Network QoS is not supported
Data-center interconnections are not supported
This report describes a project to design a network provisioning system for cloud
infrastructures that overcomes the limitations mentioned above, called the
Network Resource Management System (NRS).
NRS is a Network Virtualization platform. It is a stand-alone system for
allocating network resources by controlling and configuring both physical and
virtual network devices using plugins. The system seamlessly integrates with
cloud platforms, such as OpenNebula and OpenStack. Its core functionality
consists of network topology management, network hardware device
configuration and ability to create inter-cloud connections by negotiating with a
remote system.
The system has the following strong points:
It provisions VLANs only in the parts of the network that are required.
This greatly increases scalability compared to static VLAN
configurations.
It is very extensible. The system configures network hardware with
plugins, and adding additional networking hardware is just the matter of
creating a proper plugin.
It can connect resources lying in different cloud sites
The system is not a complete solution on its own, but rather a foundation that can
be easily extended with features. This is achieved through the system’s
modularity; it supports extensions such as
lightpath inter-cloud connections
configuration of advanced switches
transition from 802.1Q to Q-in-Q, or other network isolation mechanism
algorithms that manipulate the topology and provide connections in
ways tailored to satisfy the needs of the cloud infrastructure architect
ix
Table of Contents
Foreword................................................................................................................ i Preface ................................................................................................................. iii Acknowledgements .............................................................................................. v Executive Summary ............................................................................................ vii Table of Contents................................................................................................. ix List of Figures ...................................................................................................... xi List of Tables .......................................................................................................13 1 Introduction ................................................................................................. 1
1.1 Purpose ...................................................................................................... 1 1.2 Context ...................................................................................................... 1 1.3 Stakeholders .............................................................................................. 3 1.4 Outline ...................................................................................................... 3
2 Domain Analysis ......................................................................................... 5 2.1 IaaS Clouds ............................................................................................... 5 2.2 Grid to cloud ............................................................................................10 2.3 Network Provisioning in IaaS Clouds ......................................................12 2.4 Open-source cloud platforms ...................................................................14 2.5 Issues with network provisioning in IaaS Clouds ....................................22 2.6 Data-center inter-connect technologies ....................................................24 2.7 Recent developments in cloud networking ..............................................25
3 System Requirements .................................................................................27 3.1 The Network Resource Management System ..........................................27 3.2 Use Cases .................................................................................................28 3.3 NRS Logic and Interfaces ........................................................................32 3.4 Requirements List ....................................................................................35
4 System architecture ....................................................................................37 4.1 Prototype Overview .................................................................................37 4.2 Service Interface ......................................................................................38 4.3 Topology ..................................................................................................41 4.4 Graph representation of the network topology .........................................45 4.5 Network Isolation .....................................................................................51 4.6 Algorithms for operations on the topology ..............................................52 4.7 Device Plugins .........................................................................................57 4.8 Request Manager .....................................................................................60 4.9 External networks and inter-cloud ...........................................................64 4.10 Administrator Access .............................................................................70 4.11 Conclusion .............................................................................................75
5 Implementation ..........................................................................................76 5.1 Python ......................................................................................................76 5.2 System components..................................................................................76 5.3 Code analysis ...........................................................................................78 5.4 Concurrency .............................................................................................80
6 Deployment ................................................................................................83 6.1 NRS cloud deployment ............................................................................83 6.2 Device plugins .........................................................................................84 6.3 Inter-cloud with OpenVPN ......................................................................85 6.4 Deployment at the Nikhef cluster ............................................................87
7 Verification and Validation ........................................................................90 7.1 Functional Validation ...............................................................................90 7.2 Non-functional validation ........................................................................96 7.3 Verification ..............................................................................................98
8 Conclusions ................................................................................................99 8.1 Functional Results ....................................................................................99 8.2 Design criteria ........................................................................................101 8.3 Future development................................................................................102
9 Project Management ................................................................................104
x
9.1 Management Process ............................................................................. 104 9.2 Project Milestones ................................................................................. 104 9.3 Risk Management .................................................................................. 107
Bibliography ..................................................................................................... 109 Appendix A. Glossary................................................................................. 111 Appendix B. NRS and VM scheduling ....................................................... 116 Appendix C. NRS CLI and configuration .................................................. 120 About the Author .............................................................................................. 123
xi
List of Figures
Figure 1: Visualization of Higgs candidate event .................................................. 1 Figure 2: LHC experiment dataflow ...................................................................... 2 Figure 3: Virtual Machine Host ............................................................................. 6 Figure 4: Stacked Avaya switches ......................................................................... 7 Figure 5: Virtual and physical switches ................................................................. 8 Figure 6: The Grid service ....................................................................................11 Figure 7: Virtual Network cloud resource ............................................................12 Figure 8: OpenNebula components ......................................................................15 Figure 9: OpenNebula VNM actions ....................................................................17 Figure 10: VM connectivity with Linux bridge ....................................................18 Figure 11: VM connectivity with Open vSwitch ..................................................18 Figure 12: OpenStack Nova architecture ..............................................................20 Figure 13: OpenStack Nova networking ..............................................................21 Figure 14: OpenStack Quantum functionality ......................................................22 Figure 15: OpenStack Quantum Virtual Network ................................................22 Figure 16: Physical and logical connectivity of VMs ...........................................28 Figure 17: NRS basic request usage scenario .......................................................29 Figure 18: NRS inter-cloud scenario ....................................................................31 Figure 19: NRS in a cloud context .......................................................................32 Figure 20: NRS layered architecture ....................................................................38 Figure 21: NRS Service interface operations........................................................39 Figure 22: NML object model ..............................................................................42 Figure 23: Basic network components ..................................................................43 Figure 24: Network Node Builders .......................................................................45 Figure 25: Switch graph model .............................................................................46 Figure 26: VM host graph model ..........................................................................46 Figure 27: Topology graph ...................................................................................47 Figure 28: Object diagram of the topology ...........................................................48 Figure 29: L2Network graph ................................................................................49 Figure 30: Graphs in the class diagram ................................................................50 Figure 31: Observer pattern ..................................................................................50 Figure 32: Network Isolation with VLAN ids ......................................................52 Figure 33: Growing algorithm operation ..............................................................54 Figure 34: Algorithm modules ..............................................................................56 Figure 35: Network consistency ...........................................................................57 Figure 36: Device Plugins class diagram ..............................................................59 Figure 37: Device Plugin instantiation .................................................................59 Figure 38: Reservation sequence diagram ............................................................61 Figure 39: Request manager class diagram ..........................................................62 Figure 40: Allocation sequence diagram ..............................................................63 Figure 41: Release sequence diagram ...................................................................64 Figure 42: Connect sequence diagram ..................................................................64 Figure 43: Connecting to a gateway sequence diagram ........................................65 Figure 44: Inter-cloud negotiation class diagram .................................................67 Figure 45: Inter-cloud sequence diagram .............................................................69 Figure 46: Operations to modify the topology ......................................................72 Figure 47: Administrator CLI over Telnet class diagram .....................................74 Figure 48: NRS system state.................................................................................75 Figure 49: NRS components .................................................................................77 Figure 50: NRS OpenNebula network driver .......................................................78 Figure 51: Overview of source packages ..............................................................78 Figure 52: Sharing of request manager .................................................................81 Figure 53: NRS initialization ................................................................................82 Figure 54: NRS deployment in a cloud site ..........................................................84 Figure 55: NRS basic configuration .....................................................................84 Figure 56: NRS Open vSwitch plugin operation ..................................................85
xii
Figure 57: NRS inter-cloud with OpenVPN ........................................................ 86 Figure 58: NRS inter-cloud configuration options ............................................... 87 Figure 59: Private cloud deployment at Nikhef ................................................... 88 Figure 60:Topology snapshot of the deployed NRS system ................................ 89 Figure 61: NRS in the Grid ecosystem............................................................... 101 Figure 62: Initial project timeline ...................................................................... 106 Figure 63: Milestone trend analysis ................................................................... 107 Figure 64: Actual project timeline ..................................................................... 107 Figure 65: Three VM hosts connected to each other .......................................... 117 Figure 66: Network capacity inside a VM host .................................................. 117
xiii
List of Tables
Table 1: Stakeholders ................................................................................................... 3 Table 2: OCCI Cloud resources ................................................................................... 9 Table 3: OCCI resource links ..................................................................................... 10 Table 4: OpenNebula cloud resources ........................................................................ 16 Table 5: OpenNebula cloud aggregation features ...................................................... 19 Table 6: NRS target users ........................................................................................... 29 Table 7: Requirements List ........................................................................................ 35 Table 8: Starting an inter-cloud negotiation ............................................................... 68 Table 9: Cancelling an inter-cloud negotiation .......................................................... 68 Table 10: Source code analysis .................................................................................. 79 Table 11: Cyclomatic complexity .............................................................................. 79 Table 12: Network connectivity tests ......................................................................... 90 Table 13: Network isolation test................................................................................. 91 Table 14: VLAN device restrictions test .................................................................... 92 Table 15: Administrator static topology modification test ......................................... 93 Table 16: Administrator L2 network modification test .............................................. 93 Table 17: VLAN administrator restrictions test ......................................................... 94 Table 18: OpenVPN inter-cloud test .......................................................................... 95 Table 19: NRS project milestones ............................................................................ 105 Table 20: Risk management ..................................................................................... 108 Table 21: IaaS Cloud terminology ........................................................................... 111 Table 22: Virtualization terminology ....................................................................... 112 Table 23: Networking terminology .......................................................................... 113 Table 24: Links between VMs compared to 1Gb requested capacity ...................... 118 Table 25: Rank based on highest value among pairs ................................................ 118 Table 26: Rank based on lowest value among pairs ................................................. 118 Table 27: Rank based on choosing one among the pairs that works ........................ 118
Introduction 1
1 Introduction This chapter introduces the project for the Network Resource Management System. It
presents the background for the project, the project’s stakeholders, and an outline for
the rest of the document.
1.1 Purpose
This document describes the project for the creation of the Network Resource
Management System (NRS), which is software for network provisioning in cloud
infrastructures. It first discusses the needs that the system covers and its envisioned
functionality. It then describes the design decisions that led to the software prototype,
both on an architectural and on an implementation level. The process that led to the
project’s outcome is documented as well.
The project for the NRS system was carried out as the author’s final project in the
Software Technology Program of the Technical University of Eindhoven.
1.2 Context
The project is done on behalf of Nikhef1, the Dutch national institute for subatomic
physics. Nikhef is a contributor and collaborator in the Large Hadron Collider (LHC).
LHC is the world’s largest particle accelerator located at CERN, Geneva, and has
started its operation on September 2008. Several experiments are being conducted at
LHC that aim to provide answers to fundamental open questions in Physics. Such
experiments typically consist of the acceleration and collision of particles; special
detectors pick up the collision results as raw data, which are analyzed to provide
meaningful results. The results, collected over long periods of time, are interpreted by
physicists to gain insight on the nature of the collision phenomena. The visualization
of a collision event that occurred in the experiment for the discovery of the Higgs
particle is shown in Figure 1.
Nikhef has contributed in constructing LHC detectors and participates in LHC
experiments. LHC experiments produce huge amounts of data (in the order of 20
petabytes per year) that need to be stored and analyzed. Figure 2 shows the dataflow
1 Dutch National Institute for Subatomic Physics, http://www.nikhef.nl 2 Image source: http://www.atlas.ch/photos/events-collision-proton.html
Figure 1: Visualization of Higgs candidate event2
Introduction 2
of an LHC experiment. The storage and analysis of such volumes of data is achieved
with the use of the Grid computing platform.
Grid computing refers to large-scale distributed data processing across different data-
centers. The computing platform that implements and makes it accessible is called the
grid platform. Scientific grid providers are organizations, usually research institutes,
which provide a grid platform for scientific research of various backgrounds. The
biggest grid platform is the Worldwide LHC Computing Grid (WLCG)3, centered
around CERN. The WLCG participants, which are the organizations that provide
infrastructure to the platform, are distributed across various countries. Grid
computing basically evolved during the last 10 years around the WLCG. Nikhef is a
scientific grid provider and a WLCG participant.
Lately, there has been interest among grid providers in exploring whether and how to
incorporate virtualization and cloud computing elements in the grid platform. Cloud
computing refers to the delivery of computing resources as a service; specifically, the
resources are virtual machines, storage, and network services between them. These
resources are provisioned from the cloud site, which is the cluster where the cloud
computing platform is installed. Grid providers are looking into re-using existing
open cloud computing platforms and integrating them into their workflow. From the
available implementations, one of the areas that has not been adequately developed is
the delivery of network resources. Specifically, the managing of network resources
within a cloud site is only rudimentary, while the ability to connect cloud resources
located in different sites is absent. This is of particular importance to grid providers
and Nikhef, and this is where the project for the NRS system comes in. The NRS
system attempts to provide a basic solution that aids cloud platforms at managing
network resources, which includes dynamic resource provisioning and providing
network services across different data-centers. Cloud computing is elaborated in
Chapter 2.
3 CERN - Worldwide LHC Computing Grid,
http://public.web.cern.ch/public/en/lhc/Computing-en.html 4 Image source: Overview of the Computing Fabric at CERN [28]
Figure 2: LHC experiment dataflow 4
Introduction 3
1.3 Stakeholders
Nikhef is the main stakeholder for the project. Nikhef’s Physics Data Processing
Group (PDP) is active in grid research, deployment and software development. This
project was conducted as an activity of the PDP group. Nikhef is also a participant in
the BiG Grid project, a Dutch national initiative to improve grid infrastructure and
assist scientists of all backgrounds to use it. This project resides under the umbrella of
BiG Grid activities. All the stakeholders are shown in Table 1.
Table 1: Stakeholders
Name Role Description
Nikhef Customer,
Grid infrastructure
provider,
BiG Grid participant
Main stakeholder.
The project is done for Nikhef. Its
interest is to investigate how to
incorporate cloud services to its
infrastructure.
SARA Grid/Cloud
infrastructure provider,
BiG Grid participant
Secondary Stakeholder.
Has expressed interest in the outcome of
a network provisioning project and the
prospect to evaluate it for its cloud
infrastructure.
BiG Grid Dutch e-science
infrastructure project
Nikhef and Sara are collaborators for the
BiG Grid project, whose interests are
expressed through them.
Technical
University of
Eindhoven (TU/e)
Employer The project is the final project of the
two-year Software Technology post-
M.Sc. program of TU/e.
Although not a stakeholder per-se, there has been communication with developers of
two popular open-source cloud platforms, namely OpenNebula5 and OpenStack
6.
Cloud networking is a developing area, and it was made sure that the project was in
line with how these cloud platforms operate, without providing overlapping
functionality or features that already exist or are under development in the platforms.
In addition, the developers have expressed an interest in seeing the outcome of a
network provisioning project, especially if it is provided in a form that inter-operates
with their software stack.
1.4 Outline
The rest of the document’s chapters are outlined as follows.
Chapter 2: Domain Analysis provides information on the technology domains that are
relevant to the project, namely Infrastracture-as-a-service (IaaS) clouds and network
provisioning in IaaS clouds. It also presents limitations with current implementations,
which led to the creation of this project and determined its requirements.
Chapter 3: System Requirements describes how the high level needs of the
stakeholders are translated to capabilities of a software system. To show how these
are fulfilled, it gives a high level description of the system and its key functionality
with the aid of use-cases.
5 http://opennebula.org 6 http://www.openstack.org
Introduction 4
Chapter 4: System architecture discusses the prototype of the system, its architecture,
the elements that compose it, and its business logic. It explains various design
decisions and choices based on the system requirements.
Chapter 5: Implementation presents information on the implementation of the system
prototype, which includes the outline of the system’s components and code analysis
results. It also describes the different communicating tasks of the system.
Chapter 6: Deployment describes the possible deployment options of the system
within a cloud site, how it interacts with the site’s components, and configuration
options. It also presents the deployed infrastructure for the needs of the project.
Chapter 7: Verification and Validation presents the testing procedures that determine
to what extent that the prototype’s functionality and behavior fulfills the system
requirements.
Chapter 8: Conclusions summarizes the outcome of the project, compares it against
the initial project goals, and tries to place it among global developments in cloud
networking. It also discusses recommendations for further development of the
system.
Chapter 9: Project Management presents the progress of the project throughout its
duration and the management processes used to monitor and control it.
This project lies in the technology domain of networking and Infrastructure-as-a-
Service (IaaS) cloud computing, which encompass a variety of domain specific
concepts and terms. Appendix A provides a glossary with definitions for relevant
terms used throughout the document.
Domain Analysis 5
2 Domain Analysis The Network Resource Management System deals with network provisioning in
Infrastructure-as-a-Service clouds. This chapter describes the technology domains
related to the system, and an analysis of issues with the current state of available
technologies.
2.1 IaaS Clouds
Cloud computing in general refers to the delivery of computing resources as a
service, provided over a network (e.g., the Internet). The most basic model of cloud
computing is Infrastructure-as-a-Service (IaaS), where the resources offered are
computers, storage, and network services between them. Such a service is called an
IaaS cloud. IaaS clouds utilize hardware virtualization technology, which allows
creation of efficient virtual hardware platforms (virtual machines). The computers
offered by the IaaS service are virtual machines.
2.1.1 Virtualization
Hardware virtualization refers to the virtualization of a computer or an operating
system, i.e. the creation of an abstract platform (the virtual machine or VM) that acts
like a real computer with an operating system, hiding the physical characteristics of
the actual computing platform. Full virtualization enables the virtual machine to
simulate enough hardware, so as to be able to run unmodified “guest” operating
systems as they would run in a real computer. In addition, the guest operating system
is run in isolation from the “host” operating system, which is the operating system
running on the actual computing platform. The software that controls virtualization is
called the hypervisor and runs on physical computers, which are called the virtual
machine hosts (see a rough sketch of a VM host in Figure 3).
There are two types of hypervisors:
1) Bare-metal or native hypervisors, which run directly on the host’s hardware
to control the hardware and manage guest operating systems (KVM, Xen,
VMware ESX, Microsoft Hyper-V)
2) Hosted hypervisors, which run on top of an existing operating system
(VirtualBox, VMware Workstation)
Hardware-assisted virtualization was introduced to the x86 processors in 2005-06
(Intel VT-x and AMD-V). This refers to instruction set extensions that aim to assist
with virtualization, and enable efficient full virtualization with the help of hardware.
Bare-metal hypervisors running on hosts that support hardware virtualization exhibit
the best performance, and are the ones chosen for cloud infrastructures.
Domain Analysis 6
2.1.2 Network switches
An IaaS Cloud deploys multiple virtual machine hosts, which are hosted in some
data-center. To offer network services between virtual machines, the hosts need to be
connected to each other. They also need to be connected to a central management
point in order to be controlled and monitored.
Data-centers typically deploy network switches to connect hosts. Network switches
are network devices that connect different devices or network segments. They have
multiple ports (50-port switches are common) and forward network traffic in the link
layer of the TCP/IP stack. A switch can have ports that support different link layer
technologies (Ethernet, FibreChannel). Switches offer more sophisticated
functionality than simple network hubs. Hubs do not manage the traffic that enters
their ports, but repeat traffic coming into one of their ports to all their other ports.
Traffic is transferred over a single medium, which creates packet collisions and thus
lowers performance. Switches on the other hand do not repeat traffic to all ports, but
maintain tables that associate destination addresses with switch ports, and are able to
inspect incoming packets’ address destination. This allows to avoid blindly repeating
traffic and to forward the packets only to the switch ports towards the packet’s
destination. Switches are also implemented in a manner that provides a separate
medium for all traffic that occurs among different ports. They also provide full
duplex links, which means that there are no collisions for traffic going to opposite
directions between the same pair of ports. Thus, packet collisions are non-existent.
Overall, network switches provide the highest performance in interconnecting
different devices.
Switches forward traffic using specialized hardware, which is called the switch
backplane. The forwarding rate of a switch is determined by the implementation of
7 Image source: http://en.wikipedia.org/wiki/File:Hardware_Virtualization_(copy).svg
Figure 3: Virtual Machine Host7
Domain Analysis 7
the switch backplane. Some switch models can be “stacked”, i.e. connected to each
other via a special configuration which effectively makes them behave as one unit
with increased backplane bandwidth. An image with multiple stacked switches is
shown in Figure 4.
Apart from forwarding traffic, different switch models can have different and
advanced functionalities. Switches typically support various protocols and/or
methods for management and configuration (e.g., web interface, CLI, SNMP [1]),
monitoring (e.g., Netflow, sFlow), as well as advanced configuration options that
provide Quality of Service (QoS) guarantees.
QoS in computer networks refers to the transport of traffic with special requirements.
It aims to alleviate two network issues: network bandwidth and time sensitivity. Bit-
rate and latency are two examples that are common QoS guarantees. QoS is primarily
achieved with the ability to identify and discriminate among data traffic from
different applications that have different QoS requirements. Various policies can be
defined and applied on the different types of traffic (traffic classes). Applying a
policy means that the traffic may be shaped (refers to bit-rate limiting) or given
priority to process and forward. In general, different switch models’ and vendors’
support for QoS exhibits a lot of variation.
2.1.3 Virtual switches
Virtual Machines get network connectivity through their virtual network interfaces
(VIFs), which are seen as regular network interfaces from the guest OS as well as the
host OS. The VM can see only its own VIFs, while the host can see all VIFs that
belong to VMs that reside on the host. In order for VMs to get access to any external
network, their VIF(s) need to be connected to the VM host’s physical network
interface (NIC). This is achieved with the use of virtual switches.
Virtual (or software) switches are the equivalent of hardware network switches, but
they operate on the interfaces seen within the VM host, which include the host’s
NIC(s) and any existing VIFs. Typically the host has one NIC assigned to provide
connectivity to the “production” network. VIFs are connected to that NIC, which is
often referred to as the VM gateway. A virtual switch is depicted conceptually in
Figure 5, where eth0 is the VM host’s interface acting as the VM gateway and vnet0-
2 are the VIFs.
8 Image source: http://en.wikipedia.org/wiki/File:Avaya_ERS-2500_Stack.jpg
Figure 4: Stacked Avaya switches8
Domain Analysis 8
The simplest implementation of a virtual switch is the Linux bridge [2]. It is a
software bridge that can connect two or more network interfaces (and thus bridge two
or more Ethernet segments). Its functionality is simple; it copies all incoming
Ethernet frames to all interfaces that it is connected to. It is available in all Linux
distributions.
Apart from the Linux bridge, other virtual switches exist that may provide
functionality equivalent to hardware switches, including support for management and
monitoring protocols, QoS, etc. Well known virtual switches are Cisco Nexus
1000V9, VMware vDS
10 and the open-source Open vSwitch [3]. Section 2.4 shows
typical usage of the Linux bridge and Open vSwitch as deployed and used by cloud
platforms.
2.1.4 Cloud deployment models
The infrastructure of an IaaS Cloud consists of virtual machine hosts, switches that
connect the hosts, optionally specialized storage, and some computers that run the
cloud software platform. The cloud platform is the software that implements the IaaS
service and orchestrates the entire operation: It controls and provisions the resources
of the cloud infrastructure, it provides one or more interfaces and APIs through which
it services requests from the cloud users, and it provides tools for monitoring and
administration. Typically, the cloud platforms have different classes of users that
determine which resources the user is allowed to request. There are various open-
source cloud platform implementations; see section 2.4 for more.
IaaS Clouds come in different deployment models. Although the cloud users do not
experience different behavior, the different deployment models matter to the cloud
infrastructure maintainer.
Public clouds: These are cloud services offered to the general public by some
provider, typically with a pay-per-use model. Well known public clouds are Amazon
Elastic Compute Cloud (EC2)11
and Rackspace public cloud12
. The internal
implementation of these cloud platforms is closed-source. These services have
become very popular for web application deployment due to the speed, flexibility and
potential for scalability they provide. The popularity of cloud skyrocketed through
such public cloud services.
Private clouds: Private clouds refer to cloud services built and operated for exclusive
9 http://www.cisco.com/en/US/products/ps9902/index.html 10 http://www.vmware.com/products/vnetwork-distributed-switch/overview.html 11 http://aws.amazon.com/ec2/ 12 http://www.rackspace.com/cloud/public/
Figure 5: Virtual (or software) and physical (or hardware) switches
Domain Analysis 9
use by a company or organization to support its business operation. The cloud
infrastructure may be hosted internally or by a third-party. The big difference to
public clouds is that the organization using a private cloud has to build and manage
the cloud infrastructure, but also has full control over how it works.
Community clouds: Similar to private clouds, the difference of community clouds is
that they are shared between various organizations that have a common goal.
Deployment and maintenance is shared as well, incurring less costs than if each
organization would have to maintain its own private cloud.
Hybrid clouds: A hybrid cloud aggregates private with public clouds, offering
resources that may originate from either cloud. Each cloud still is a separate domain
of deployment and administration. A hybrid cloud needs certain mechanisms that can
seamlessly integrate the different types of clouds it aggregates.
2.1.5 Cloud resources
In general, different cloud platform implementations name the resources they provide
to their users differently, and there is no standardization of APIs or interfaces.
Virtually every cloud platform offers its own platform-specific API. An API that has
become a de-facto standard is the Amazon EC2 query [4], due to the popularity of the
Amazon service. Despite API and interface differences, in the end the resources
offered are the same: compute, storage, and network.
There has been an attempt to create a standard specification for requesting IaaS cloud
resources, called the Open Cloud Computing Interface (OCCI) [5]. OCCI has not
been widely adopted by cloud platforms as of the writing of this report, but it is
essentially the sole effort to unify cloud resource interfaces. OCCI specifies three
types of resources, and two types of links that connect resources to each other. The
OCCI resources are shown in Table 2, together with the Amazon Web Services
(AWS13
) resource equivalent to indicate how naming can change between different
cloud services. Amazon was chosen because it is the most popular cloud service.
Table 2: OCCI Cloud resources
Resource name AWS equivalent Details
Compute EC2 Instance Information processing resource. Typically,
a virtual machine. Has CPU speed, memory
and other attributes.
Storage EBS14
Volume or
S315
object
Information recording resource. Used for
block storage or object storage service. The
storage can be persistent, i.e. it will remain
after a VM has been deleted.
Network VPC16
subnet A link layer networking entity, or a virtual
switch. Represents a single broadcast
domain for resources attached to it. Has a
802.1Q VLAN tag, and a name attribute.
The Network and Storage cloud resources need to be attached to a Compute resource
to become usable. The connections are modeled by Links in OCCI, as shown in Table
3.
13 Amazon Web Services, umbrella for all Amazon cloud services (http://aws.amazon.com/) 14 Elastic Block Storage, Amazon cloud block storage (http://aws.amazon.com/ebs/) 15 Simple Storage Service, Amazon specialized storage service (http://aws.amazon.com/s3/) 16 Virtual Private Cloud, private networks with AWS resources (http://aws.amazon.com/vpc/)
Domain Analysis 10
Table 3: OCCI resource links
Link name Details
NetworkInterface Promptly named, the NetworkInterface connects a
Compute resource to a Network. Has a MAC
address and optionally an IP address as attribute.
StorageLink Mounts a Storage resource to a Compute resource.
An attribute is the mount point in the guest OS.
The three OCCI resources aim to fully describe what is provided by cloud platforms,
and they seem to be sufficient for the cloud platforms analyzed for the needs of this
project (section 2.4). The resource of interest is the Network resource, which is
further explained in section 2.3.1.
2.2 Grid to cloud
The technologies referred to as grid computing have been developed a decade prior to
cloud computing. Both computing models fulfill approximately the same goal, i.e.
deliver computing as a service. The key missing ingredient in grid implementations is
virtualization, which brings benefits such as improved reliability and usability
(discussed in section 2.2.2).
2.2.1 The Grid
A grid computing platform enables aggregation of heterogeneous and geographically
dispersed computing resources and provides transparent access to them. It is a form
of distributed computing composed of loosely coupled resources across different
administrative domains. Its operation is facilitated by grid middleware, software that
is responsible for managing and distributing workload across computing resources.
Most production grids focus on assisting scientific applications. The Grid refers to the
Worldwide LHC Computing Grid, which is the largest grid platform, is distributed
internationally, and is focused on storing and analyzing data produced by LHC
experiments.
Domain Analysis 11
The Grid computing service can be described as a Platform-as-a-Service computing
model. Such a service typically provides its users with a platform which they can use
to execute software. Using the grid platform, scientists are able to access and analyze
data, in the form of submitting data analysis “jobs.” A scientist can access the grid
platform from anywhere; both the storage location of the data he or she requests and
the decision where to execute the jobs are determined by the grid platform (see Figure
6 for the overview). After the jobs are distributed to the selected destinations, they are
executed using technologies specific to the local deployment (job schedulers etc.).
2.2.2 Transition to cloud
There has been interest among grid providers in exploring a transition from the grid
service model to a model that involves IaaS clouds (or the very least hardware
virtualization) and the implications of such a transition. In the current grid service,
jobs are scheduled for execution on physical machines using various job schedulers.
With the introduction of cloud technologies, a “job” would be run inside virtual
machines. There are several perceived benefits for such a move: The virtualization
layer that is added provides an exclusive environment (the virtual machine) for a job
to be executed in. This means that the environment can be configured extensively: a
job may require specific privileges (e.g., root access) or a specific OS to run in (e.g.,
Windows). These are not possible to achieve with the current grid infrastructure, and
this is considered a portability barrier for using the grid as utility computing.
Apart from configurability, using virtualization creates isolated environments where
the user is confined, which improves security and reliability; in case something goes
wrong inside the VM, the host is unaffected. In addition, VMs can be easily moved to
different hosts while running without disrupting their operation (an action called VM
live migration). Upgrading the hardware and software of VM hosts can be performed
without worrying about potential incompatibilities for the VM user (since guest OSes
are unaffected) or disrupting the operation of running VMs (since they can be
migrated).
Moving to a cloud service brings a drawback: An administrator no longer has control
17 Image source: Overview of the Computing Fabric at CERN [28]
Figure 6: The Grid service17
Domain Analysis 12
over what a user is running in his or her VM, and more importantly over what
network traffic the user can generate. This poses a concern when it comes to
performance and security, since the machine running the VM is possibly connected to
dozens of other machines, and uncontrolled network traffic can disrupt or damage the
operation of other users’ VMs. Therefore, there is a need to be able to restrict and
fully control the network provided to VMs, and to be able to map network traffic to
specific users for administrative and control purposes.
2.3 Network Provisioning in IaaS Clouds
Part of the IaaS cloud service is to offer network services between the requested
virtual machines. The basic network service is connectivity in local-area networks,
but can also include wide-area connectivity, both to the Internet or to virtual private
networks (VPN). Virtual machines are hosted in computers that run other users'
virtual machines, and these computers are connected to several other computers
running virtual machines as well. It is mandatory that the networks offered between
virtual machines are isolated from the rest on a per-user basis. That is, a private and
isolated Ethernet segment is required for each different cloud user that requests a
private network. This network is modeled and presented to the user as the virtual
network cloud resource (the OCCI Network resource). The term virtual network will
be used to refer to this resource.
2.3.1 The Virtual Network cloud resource
Virtual networks are resources available to cloud users. From the user perspective,
they represent a private LAN, which is equivalent to a network switch, as seen in
Figure 7. The user is able to connect his or her VMs to a virtual network, an action
that is often referred to as attach or plug a VM to the network. The expected behavior
is that all attached VMs have connectivity to each other, and that the network is
private. The implementation details that achieve this behavior are not relevant for the
user.
2.3.2 Virtual Network implementation
To isolate the network traffic inside a Virtual Network, certain configuration is
required at some points along the network path that connects the different virtual
machines that belong to the network. The path includes both the VM hosts and
hardware switches: the VM gateway on the hosts (the virtual switch that connects the
VM to its host) and the ports of the switches that connect the hosts' physical
interfaces. The straightforward way to segment Ethernet uses 802.1q VLANs [6],
which is a networking standard that supports virtual local area networks.
Figure 7: Virtual Network cloud resource
Domain Analysis 13
2.3.3 VLANs
In general, a Virtual Local Area Network (VLAN) refers to the partitioning of a
physical network to create logically separated broadcast domains. IEEE 802.1Q is the
networking standard that supports VLANs on Ethernet networks, with up to 4094
different VLANs. Each VLAN has a unique identifier (from 1 to 4094).
The VLAN partitioning is performed at the link network layer (L2) by switch
devices. Each switch port can be assigned to be a member of a specific VLAN. The
switch will forward traffic only among ports that are members of the same VLAN,
effectively separating the traffic at the link layer level. These ports are called access
ports to the specific VLAN.
In case more than one VLAN needs to be transferred over a single port, the concept
of a trunk port is used. A trunk port can carry multiple VLANs, and differentiates
among them with the aid of VLAN headers. 802.1Q specifies an extra field in the
Ethernet frame that includes the VLAN identifier; that field is commonly called the
VLAN tag. A trunk port may have multiple tagged VLAN traffic and one untagged
VLAN.
802.1Q VLANs are supported universally by all switch vendors, although some
switch models support less than 4094 VLANs. This limited number of the protocol
creates a hard limit in scalability. Going above 4094 VLANs (which can be
equivalent to 4094 different cloud users, if each user has a virtual network) requires
the adoption of a different technology.
802.1Q Alternatives
Various 802.1Q alternatives are emerging that may solve the VLAN scalability issue.
A similar in nature to 802.1Q is 802.1Qad, also known as Q-in-Q [7], which works
by stacking VLANs. It allows for more than one VLAN header to be inserted in the
Ethernet frame, effectively creating nested VLANs. This allows for many more than
4094 VLAN combinations.
Multiprotocol Label Switching (MPLS) [8] is a different mechanism that can be used
to create VLANs. MPLS is a traffic routing protocol that uses special short labels that
direct data from one network node to the other. It can be used to carry Ethernet
frames or IP packets, thus replacing the traditional mechanism of the IP tables
determining how a packet will be forwarded. The MPLS labels form a virtual path
between distant nodes, thus creating the equivalent of a VLAN.
Virtual Extensible LAN (VXLAN) [9] [10] is a method that tunnels L2 frames over
IP to create VLANs. It uses a 24-bit network identifier, the VXLAN id, which allows
for many more than 4094 VLANs. VXLAN requires special VXLAN gateways that
encapsulate L2 frames and forward them to their destination using the existing
common network infrastructure. VXLAN also requires IP multicast support in order
to work. VXLAN is a new protocol and is supported by various companies (VMware,
Cisco, Arista), but at the moment the only implemented VXLAN gateways are a few
Cisco virtual switches. As for performance, VXLAN requires slightly higher
bandwidth that 802.1Q VLANs due to the larger size of its encapsulated packets, and
higher computing capacity to perform the encapsulation at the VXLAN gateways.
At this point in time, 802.1Q it is the only universally deployed and working method
that achieves the goal of network isolation, and a substitution of it does not seem to
be emerging in the immediate future. The alternatives mentioned above all require
hardware and software support, and none of them have been widely adopted or have
even started to become so. However, if large-scale cloud services are the future,
802.1Q VLAN technology will have to be replaced somehow.
2.3.4 Beyond L2 networks
Apart from local and Internet connectivity, more advanced network services are
considered for cloud services, for example firewall or load-balancer services. The
Domain Analysis 14
umbrella term Network-as-a-Service (NaaS) refers to all such services. As of the
writing of this report, this is an area of active development, but nothing concrete has
reached available cloud platforms. See section 2.7 for more on recent developments
in cloud network services.
2.4 Open-source cloud platforms
There are various cloud platforms that implement a cloud service in a computer
cluster. Typically, they require a few machines to run the cloud platform software,
and a number of machines to act as VM hosts. Popular open-source cloud platforms
are OpenStack [11], OpenNebula [12], and Eucalyptus [13].
2.4.1 OpenNebula
OpenNebula is a fairly mature (dating from 2008) open-source cloud software
platform that can deploy IaaS services. It was created and still developed by the DSA
(Distributed Systems Architecture) Research Group18
at Complutense University of
Madrid, with contributions by various EU projects and organizations19
. It has
spawned the C12G Labs20
private company that provides a commercial version of
OpenNebula, named OpenNebulaPro, and commercial support on cloud deployment.
OpenNebula is very modular, easily extensible, and amply documented21
. In addition,
it is used to power SARA’s HPC cloud22
, which was accessible for the needs of
evaluating OpenNebula. These are the main reasons it was chosen as the main cloud
platform to experiment and base the NRS system on.
Architecture
The OpenNebula system components and interfaces are shown in Figure 8. The
OpenNebula software is installed on one machine, the OpenNebula front-end, from
which it orchestrates the cloud operation. The front-end controls the set of VM hosts
via SSH communication; for that purpose, a special Unix user is required. The frond-
end processes are run under ownership of this user, and the VM hosts must allow this
user to connect to them passwordlessly over SSH and control their hypervisor.
18 http://dsa-research.org 19 http://opennebula.org/about:sponsors 20 www.c12g.com 21 http://opennebula.org/documentation:documentation 22 https://www.cloud.sara.nl/
Domain Analysis 15
The front-end software core consists of two C++ binaries:
1) the VM scheduler, which is responsible for choosing the hosts where the
VMs will be instantiated. VMs may have certain requirements (CPU and
memory) which are taken into account by the scheduler.
2) the main program, which accepts requests and manages VMs as they
proceed through their possible states.
An OpenNebula VM’s status is represented by various states. The basic ones are
Pending (right after being requested), Prolog (VM images being transferred to the
VM host), Boot (the hypervisor has just launched the VM), Running (guest OS ready
to use), and Done (VM no longer available). As a VM changes states, certain actions
need to be performed; for example copying VM images to the VM host, or triggering
certain hypervisor actions. This functionality is realized by OpenNebula’s drivers,
which are sets of scripts that perform the required actions. For example, the TM
(Transfer Manager) driver is responsible for moving and copying VM images and the
VM (Virtual Machine Manager) driver makes calls to the hypervisor to launch or
destroy a VM.
Most drivers are directly assigned to VM hosts, and different hosts may have
different drivers. That allows to seamlessly integrate hosts that use different
technologies, for example to have one host using the KVM hypervisor with the KVM
driver, and a different host using VMware with the VMware driver. New drivers are
very easy to create; all that needs to be done is creation of a new set of scripts that
comprise the driver. The drivers that OpenNebula comes out of the box with are
Ruby modules.
Resources
OpenNebula provides various resources to their users (and calls them Virtual
Resources), and every resource is owned by a specific user. Depending on
23 Image source: http://opennebula.org/documentation:rel3.6:introapis
Figure 8: OpenNebula components23
Domain Analysis 16
configuration and permissions, users are free to create new resources of their own, or
re-use existing ones. OpenNebula resources are represented by text files that use an
OpenNebula specific template. The resources are shown in Table 4 with their OCCI
equivalents.
Table 4: OpenNebula cloud resources
OpenNebula Virtual
Resource
OCCI equivalent Details
Virtual Machine
Template
Compute A VM template contains all the
information related to a VM, including
CPU architecture, memory size, and links
to all other resources the VM uses (Images
and Networks).
A VM template can be instantiated a
number of times to create Virtual Machine
Instances.
Virtual Machine
Instance
Compute A usable Virtual Machine that can be in
any valid VM state, and can receive
actions to trigger state transitions.
(shutdown, start, and others)24
Virtual Machine
Image
Storage A file that can be mounted to the VM. Can
be attached to the VM as a OS image from
where the VM will boot, a persistent data
block or a cd-rom.
Virtual Network Network A L2 network that VMs can be connected
to.
Interfaces
OpenNebula supports Amazon EC2, OCCI and an OpenNebula-specific XLM-RPC
interface to provision resources.
Networking
OpenNebula Virtual Networks can be created by a user or admin, and can be attached
to a VM (the attachment takes place in the VM template). Each attached network
corresponds to a new VIF in the VM. When a Virtual Network is first created, it
receives a unique identifier by OpenNebula which is used to identify it throughout its
existence. This network id and is not related to the VLAN id or any other such id. A
Virtual Network template25
may look as follows.
#OpenNebula virtual network template NAME = vlan6 BRIDGE = br0 TYPE = RANGED VLAN = YES VLAN_ID = 6 #PHYDEV = eth0 NETWORK_ADDRESS = 10.10.17.0/24 IP_START = 10.10.17.2
24 A state machine model describing the VM states is available at
http://opennebula.org/documentation:rel3.6:vm_guide_2 25 http://opennebula.org/documentation:documentation:vnet_template
Domain Analysis 17
There are a few interesting options; the BRIDGE attribute must be specified, and it is
the name of the bridge that the VM has to be connected in order to gain access to that
network. There is a one-to-one correspondence between a Virtual Network and a
bridge name, and all VM hosts must have a bridge of the same name in order to be
able to connect VMs to the specific network. The VLAN and VLAN_ID options are
related to the Virtual Network Manager driver (see next paragraph). The Virtual
Network also provides options to set the IP address range or fixed IP addresses of the
VMs in this network.
Virtual Networks are implemented by the OpenNebula Virtual Network Manager
(VNM) driver, similar in concept with the rest of the drivers. The VNM driver is a set
of scripts that are run in specific VM state transitions and are there to provide
network configuration. The transitions that trigger a VNM configuration action are
shown in Figure 9. Each host can have its own VNM driver. When VMs are launched
on the host, the VNM driver will be used to provide network configuration only if the
VLAN option in the VM’s networks is set to YES. The VNM driver may also use the
VLAN_ID option, depending on its implementation.
OpenNebula comes with two VNM drivers that implement 802.1Q VLANs, the
802.1Q VLAN driver and the OpenvSwitch driver.
The 802.1Q VLAN driver uses linux bridges and the PHYDEV option in the network
template. PHYDEV specifies the host NIC that acts as the VM gateway. For each
Virtual Network, the driver creates a VLAN device26
on top of the PHYDEV and a
bridge that connects the VLAN device to any VIFs that need to be connected to the
network. This functionality is shown in Figure 10, where three VMs are connected to
two networks, and each network is accessible through its bridge. The VLAN id is
either taken from the Virtual Network template, or chosen with a predefined method
in the drivers’ scripts. The driver can create bridges, but it does not remove them or
do any garbage collection whatsoever. It also requires all VM hosts to use a NIC with
the same name (e.g., eth0) as a VM gateway. The driver only performs configuration
on VM hosts; in order to work, it requires that the switches connecting the hosts
allow all possible VLANs to go through.
26 A VLAN device is a logical network interface that can be created on top of an existing NIC.
It receives tagged VLAN traffic and removes the tag, acting as an access port to the VLAN. It
is often named as <nic>.<vlan-id>. For example, eth0.15 will receive all tagged VLAN traffic
from VLAN 15 that reaches eth0 and will remove the tag.
Figure 9: OpenNebula VNM actions
Domain Analysis 18
The OpenvSwitch driver uses an Open vSwitch bridge, which is a much more
sophisticated version of the linux bridge. The name of the Open vSwitch bridge is
specified in the network template in the BRIDGE option. Interfaces can be attached
to the bridge as trunks or access ports, as in regular hardware switches, and no VLAN
devices are required. The Linux bridge scenario from Figure 10 is shown in Figure 11
with the Open vSwitch driver. Similar to the 802.1Q driver, the Open vSwitch driver
it requires all switches to allow all possible VLAN ids to go through, and the Open
vSwitch bridge has to have the same name in all hosts.
The OpenNebula mechanism that deals with network configuration, which is the
VNM drivers, has a few limitations. One is that the scope of driver actions is VM
specific, with no global state information being kept related to the status of network
resources (bridges and VLAN ids, for example). This is why bridges cannot be
cleaned up; there is no information on whether more VMs are using a bridge or not.
In addition, there is no actual managing of the 802.1Q VLAN ids, as in keeping a
state of used and unused VLAN ids, and being able to select and release VLAN ids
from and to that pool. At best, VLAN ids can be statically assigned to Virtual
Networks based on a calculation involving the network id, but there is no control over
the network id. Lastly, the VNM drivers require the switches to trunk all possible
VM-supporting VLANs.
Chapter 6 describes how the VNM mechanism was used to interface with
OpenNebula.
Figure 10: VM connectivity with Linux bridge
Figure 11: VM connectivity with Open vSwitch
Domain Analysis 19
Cloud Aggregation
OpenNebula supports methods for aggregating different cloud sites and presenting
their resources in a uniform manner to their users. They are listed in Table 5.
Table 5: OpenNebula cloud aggregation features
Feature Name Purpose Details
oZones Aggregate remote
OpenNebula cloud sites
Provides centralized management and
provisioning of Virtual Resources
located in different cloud sites.
Hybrid Cloud Aggregate an
OpenNebula cloud with
different cloud
technologies
Allows to add and use Amazon EC2
and deltacloud instances from an
OpenNebula cloud.
Public Cloud Expose an OpenNebula
cloud using common
public cloud interfaces
Allows to use OpenNebula resources
using the EC2 and/or OCCI interfaces
oZones is the most interesting feature, as it allows the oZones user to access and use
Virtual Resources that are available in different and remote OpenNebula installations
that may belong to different administrative domains. The oZones feature uses
abstractions called zones and Virtual Data Centers, which group together resources
belonging to the same OpenNebula installation. The user may have access to multiple
zones and Virtual Data Centers, which means that he or she has access to resources
that belong to different OpenNebula installations. The resources available through
oZones are the typical OpenNebula resources; this includes Virtual Networks, but a
user is not able to connect Virtual Machines belonging to different zones (i.e. to
remote OpenNebula installations) to a common Virtual Network (something that
would require an inter-cloud connection, which is a concern of the project’s
stakeholders).
2.4.2 OpenStack
OpenStack is a cloud platform that came into existence with the merging of NASA’s
Nebula project, an IaaS cloud platform, with Rackspace’s Cloud Files project, a cloud
storage service similar to Amazon S3. OpenStack is a rather new project, launched in
2010, and has been joined by more than 150 companies (including AMD, Intel, all
Linux commercial support and distributors, Cisco, Yahoo, and others). Compared to
OpenNebula, it is less stable and less mature, harder to extend, with more obscure
documentation, and with no cloud aggregation features. However, it is under rapid
development and promotion and with the amount of support it receives, it seems like
it will dominate at some point in the future.
OpenStack has two main components (and a few lesser). They are named Nova (or
Compute) and Swift (or Storage). The Nova component provides the IaaS service
similar to OpenNebula and is of interest for the project, while the Swift component is
not relevant.
Architecture
OpenStack Nova architecture is based on share-nothing message passing. Nova
consists of multiple components, called Controllers, that can be run on different
machines and are assigned with a specific task. There are Controllers for providing
Virtual Machines (Compute Controller), block storage (Volume Controller), and
networking between VMs (Network Controller). There is a Cloud Controller that
keeps the global state and interacts with all the other components. All components
communicate with each other using RabbitMQ, an implementation of the Advanced
Domain Analysis 20
Message Queue Protocol (AMQP27
). AMQP is a protocol for message-oriented
middleware, which implement the message broker architecture. OpenStack Nova
architecture is shown in Figure 12. Compared to OpenNebula, Nova’s architecture
can scale better due to the use of distributed Controllers.
Interfaces
OpenStack supports Amazon EC2 and an OpenStack-specific RESTful interface.
Networking
Instead of Virtual Networks, Nova has the concept of Projects, which are isolated
resource containers that have VM instances, a VLAN id, and an IP range. Although
the semantics seem to be different (a Project contains VMs, instead of a VM being
attached to a Virtual Network), Projects are equivalent to Virtual Networks.
Nova uses a “VLAN DHCP networking mode” to implement VLANs. For each
Project, a bridge and a VLAN device is created on every host that contains VM
instances of the project, in a similar fashion to the 802.1Q OpenNebula driver. Nova
provides some additional utilities compared to OpenNebula; it automatically creates a
VPN box instance (a VM running VPN) inside the Project, so that the user can
connect to his VMs through the VPN. It also supports providing Internet connectivity
and floating IPs to the Project’s VMs. These features are illustrated in Figure 13.
27 http://www.amqp.org 28 Image source: http://docs.openstack.org/developer/nova/nova.concepts.html
Figure 12: OpenStack Nova architecture28
Domain Analysis 21
Apart from providing Internet connectivity which is a welcome addition, Nova’s
networking when it comes to VLAN provisioning is even less diverse than
OpenNebula’s and has the same limitations. However, as a testament to the rapid
development of OpenStack, a new way of dealing with networking will have been
introduced to OpenStack by the release of this report: OpenStack Quantum29
will be a
new component available with the 27-9-12 release of the next version of OpenStack,
and will replace the previous networking modes.
OpenStack Quantum
Quantum is, in essence, an API specification that needs to be implemented by
aspiring network services that will implement Network Virtualization. When Virtual
Machines need to be connected to a network, Nova sends a request to the Quantum
Network Controller containing information on what needs to be connected, which is
a Virtual Network id and a set of VIFs (with the introduction of Quantum, the
terminology apparently changes from Project to Virtual Network). Quantum
translates the request to various API calls that manipulate the Virtual Network object
as needed (for example, attaching or detaching VIFs to it). These API calls have to be
implemented by a specific network service, called a Quantum plugin, that deals with
how the network provisioning is achieved. This provides a clean interface against
which third parties can write their network services to easily interoperate with the
OpenStack IaaS service. Quantum’s functionality is shown in Figure 14.
Quantum enchances the Virtual Network concept with the Virtual Port, which are
logical ports that belong to the Virtual Network object itself, and can be set
administratively UP or DOWN (enabled or disabled). The VIFs that need to be
connected to the Virtual Network are connected to the Virtual Network’s Virtual
Ports. The Quantum API calls revolve around manipulating these three objects. The
Virtual Network as envisioned by the Quantum team is shown in Figure 15.
29 http://wiki.openstack.org/Quantum
Figure 13: OpenStack Nova networking28
Domain Analysis 22
OpenStack Quantum comes with a few implemented Quantum plugins. Two of them
are a Linux bridge plugin, and an Open vSwitch plugin, which essentially have the
same functionality as the two respective OpenNebula VNM drivers.
2.4.3 Eucalyptus
Eucalyptus31
is the oldest open-source cloud platform, dating from 2008. It is
developed and sponsored by Eucalyptus Systems. Its architecture is similar to
OpenStack Nova, using distributed Controllers to manage Virtual Machines and
storage. Its networking support is similar to the previous two cloud platforms. It
supports 802.1Q VLANs using linux bridges (called “Host-managed mode”), and
carries the same limitations as the other two platforms. Eucalyptus supports Amazon
EC2.
Eucalyptus was not explored further since it neither exhibits any particular
differences from the rest of the cloud platforms nor is it more friendly to being
extended.
2.5 Issues with network provisioning in IaaS
Clouds
As seen in section 2.4, all major open-source cloud platforms implement network
provisioning with similar limitations when it comes to network configuration and
VLAN management.
The most important limitation is that cloud platforms do not have any control over
30 Image source: http://wiki.openstack.org/Quantum 31 http://www.eucalyptus.com/
Figure 14: OpenStack Quantum functionality30
Figure 15: OpenStack Quantum Virtual Network30
Domain Analysis 23
the network switches. They only control the VM host environment, which is the
virtual switches, the host NICs and the VIFs, and this is the only point where Ethernet
traffic is segregated (with 802.1Q VLANs). To make networking possible in this
manner, cloud platforms require that the entire switching infrastructure is configured
as “VLAN-clean,” which means that all switch ports should be configured as trunk
ports that allow all VLAN ids (or at least all VLAN ids that will be used to support
VM connections). This allows all generated traffic to go through the switches and
reach the VM gateways and virtual switches; every VLAN is one large broadcast
domain that contains all the switches and all the virtual switches. Network isolation
and partitioning is provided only at the VM gateway endpoint. This approach is far
from optimal, due to the following issues:
1. VLAN management
Network hardware (NICs, switches) have limitations on the VLAN numbers
they support. The limit can be static and/or dynamic (a limitation of
maximum number of different concurrent VLANs at runtime). This
knowledge is absent from existing implementations.
In addition, the network administrator may want to apply arbitrary
limitations to the VLANs available for use, perhaps because he wants to
partition his network using different VLANs.
Cloud platforms do not support dealing with such limitations, especially
dynamic ones. The network administrator would have to manually configure
complex VLAN setups. In addition, cloud platforms do not provide a
mechanism that can optimally manage available VLANs.
2. Performance and Scalability
Since every VLAN is a large broadcast domain containing all the switches
and VM hosts, broadcasts done within a VLAN will reach all these devices.
From a scalability perspective, the whole network operates as one large
broadcast domain. This creates issues such as MAC address exhaustion in
the switch MAC address tables, and performance problems due to the high
volume of broadcast traffic. (A broadcast is transmission of packets to all
reachable destinations). These are significant scalability problems. They can
be alleviated by administrators if they manually partition their network, but
that leads back to problem one.
3. Security
If a VM host is compromised, the attacker can gain access to other users’
private network traffic.
4. Ability to use network features (QoS, ACLs)
Network features such as Quality of Service and access lists cannot be
provided, since they would require configuration of the network hardware.
Although the situation is not optimal, private networks within a cluster are still
possible. The scalability potential however is kept to a minimum. What is also an
important lack is the absence of QoS support, as it prevents cloud solutions from a
whole range of network services.
The issues mentioned above refer to conditions within the same cluster. There is one
more issue that greatly increases the need for a network provisioning system, and it
lies elsewhere: There is currently no support for private networks that span different
cloud sites, a setup that would require an “inter-cloud” connection. Such a connection
would connect private networks located in different cloud sites in one network
segment. The management of such an inter-cloud connection is not trivial: There are
many different technologies that could implement it and they are managed in
different ways (see section 2.6).
These issues indicate the need for a network provisioning service that can provide
VLAN id management, QoS, and cross-site connections, or to phrase it more broadly,
that can manage network resources.
Domain Analysis 24
2.6 Data-center inter-connect technologies
As seen in section 2.4, being able to provide inter-connections between different
cloud sites is not supported by the explored cloud platforms. Such a connection is
referred to as an “inter-cloud” connection. If the cloud sites are located in different
data-centers, an inter-cloud connection requires traffic to be sent out of the local
computer cluster into an external network (possibly the Internet) and reach a remote
cluster in a safe manner. This can be achieved by various technologies.
2.6.1 Virtual Private Networks
Virtual Private Network (VPN) technology allows to extend local networks over an
intermediate network, e.g. to connect computers to a remote isolated network over the
Internet. The traffic that goes over the intermediate network is secured so that it stays
isolated within the intermediate network. A VPN allows to remotely access resources
that are located in an otherwise unreachable remote network, and is often deployed in
companies’ private networks so that a company’s employees can access the
company’s network remotely, e.g., from home.
There are three types of VPN implementations, based on the mechanism they use to
implement security:
SSL/TLS32
, which is a set of cryptographic protocols that provide encryption
of packets in the Application layer before they are being passed to the
Transport layer.
IPsec33
, which is a security extension of the IP protocol and provides
encryption at the Network layer.
PPTP34
, which encapsulates packets in a GRE tunnel and uses a control
channel over TCP. GRE35
is a tunneling protocol that can encapsulate
various network protocols over IP. PPTP is available mainly in the
Microsoft Windows PPTP stack, which includes various encryption
methods.
An open-source implementation is OpenVPN36
, which creates secure remote
connections with SSL/TLS. OpenVPN offers two types of networking using TUN
and TAP devices, which are virtual network interfaces. A TUN device is a network-
layer device with an IP address, and can be used for routing. When OpenVPN is used
with a TUN device, it creates an IP tunnel that can route between two different IP
subnets (L3 networking). A TAP device is a link-layer Ethernet device. When used
with OpenVPN, it allows to bridge remote LANs into a single broadcast domain (L2
networking). OpenVPN is one of the few SSL/TLS capable VPN products that
provide L2 networking capability.
2.6.2 Lightpaths
Lightpaths are optical end-to-end connections implemented via Wavelength-division
multiplexing fiber links. It is possible to transfer various link layer protocols through
them, for example Ethernet. They are independent from external networks (such as
the Internet) and thus provide security, reliability and guaranteed bandwidth capacity.
They also have high bandwidth capacity and minimal latency due to the nature of the
technology.
Lightpaths come in two types: fixed and dynamic. Fixed lightpaths are permanent
connections with a guaranteed bandwidth, and are assigned to a “permament”
customer. Dynamic can be set up dynamically by users, customizing bandwidth to
32 Transport Layer Security Protocol, http://tools.ietf.org/html/rfc5246 33 Internet Protocol Security, http://tools.ietf.org/html/rfc4301 34 Point-to-Point Tunneling Protocol, http://tools.ietf.org/html/rfc2637 35 Generic Routing Encapsulation, http://tools.ietf.org/html/rfc1701 36 http://openvpn.net/
Domain Analysis 25
their needs.
Lightpaths are not generally available in public networks, but they are available to
BiG Grid providers via SURFnet37
, a collaborating institute.
2.7 Recent developments in cloud networking
Cloud networking is a domain that is undergoing innovation and development. To
illustrate, this section shows some recent developments in networking, most of which
took place while this project was underway.
Software-Defined Networking (SDN) is a networking paradigm that has been
receiving attention while cloud services are becoming more widespread. SDN in
general refers to the control of the routing of network traffic by software. In
traditional networking, the traffic routing is determined by the routing tables
maintained by each router. Routing tables are populated by each router individually,
using protocols such as the Open Shortest Path First (OSPF) and Border Gateway
Protocol (BGP) protocols for proximal and distant networks respectively. This makes
traditional routing fully decentralized. The routing tables determine how the switch
forwarding tables are populated, which in turn determine how traffic is forwarded.
SDN, on the other hand, influences routing and/or forwarding with software. SDN is
a broad term that encompasses all protocols or software that attempts to influence
traffic forwarding.
The leading SDN architecture is OpenFlow38
. OpenFlow is a communications
protocol that can modify the switch forwarding tables using a remote software
controller; the switch forwarding tables are no longer populated based on the routing
tables. This is referred to as separating the control plane (the decision on how traffic
will be routed) from the forwarding plane (the act of forwarding). OpenFlow routes
traffic by defining network flows. A packet may be assigned to a flow using arbitrary
rules that can be based on packet inspection, include source/target MAC and IP
addresses, IP subnets and TCP ports. Each flow is assigned a specific action (e.g.,
forward packet to port 3, drop packet) that is performed on the packets of the flow.
This information is used to determine how to populate the switch’s forwarding tables
in order to achieve the desired behavior, which could be creating a specific
connection path between two endpoints (another way of creating VLANs). SDN with
OpenFlow allows for extreme flexibility and potential in defining network flows. In
order to worm, OpenFlow requires support from the switches. When it comes to
performance, the existence of the software controller means that the processing
performance of the controller must be exceptional in order to compare to the
traditional fully distributed routing.
Nicira is a new network virtualization start-up company that launched its Network
Virtualization Platform (NVP39
) in February 2012. NVP claims to be able to create
Virtual Networks on top of an existing network infrastructure. NVP shares similar
concepts with VXLAN (section 2.3.3), but uses Open vSwitch extensively. Ethernet
frames are tunneled over IP and transferred between the Open vSwitches, which are
essentially the gateways that perform the encapsulation and transmission of the
packets. The main (and practically sole) contributor to the Open vSwitch project is
Nicira. The main difference with VXLAN is that NVP also has a central software
controller that centrally manages and controls the Open vSwitches using OpenFlow.
This allows for flexibility in forming and provisioning Virtual Networks, and makes
NVP an SDN platform. Nicira was bought by VMware in July 2012, which may be
an indication of value of the NVP approach.
OpenStack Quantum is a brand new OpenStack component that was introduced on
September 2012. Quantum aims to create a clean interface that opens the OpenStack
37 http://www.surfnet.nl/ 38 http://www.openflow.org/ 39 http://nicira.com/en/network-virtualization-platform
Domain Analysis 26
platform to aspiring network services. Quantum’s major contributor is also Nicira,
who aims to use OpenStack as the vehicle for its flagship NVP product.
Embrane is another network virtualization start-up company that announced its
heleos40
platform in December 2011. heleos provides L4-7 network services (such as
load balancing, firewall) to cloud resources. The services themselves are
implemented in Virtual Machines and are exposed using the Distributed Virtual
Appliance logical container, which may aggregate multiple Virtual Machines. When
the services become too CPU-intensive, the VMs implemented the service can
increase on-demand, thus making the service scalable. This is a problem in the
traditional implementation of such services, which uses single Virtual Machines that
effectively cannot scale.
OpenNaaS41
is a newly announced (2012) research project that aims at providing an
open-source NaaS platform. The platform is conceived as a service to different
network domains, allowing them to provide their network infrastructure to the cloud.
It gives them the ability to abstract arbitrary network infrastructure resources (routers,
switches, more complex systems) and provide them as virtualized resources to end-
users. The scope of the OpenNaaS project is quite large; it aims to serve from simple
cloud users to telecom operators. OpenNaaS is at a very early stage of
implementation.
Edge Virtual Bridging (IEEE 802.1Qbgm, EVB) [14] and Multiple VLAN
Registration Protocol (IEEE 802.1ak, MVRP) [15] are IEEE standards that may help
overcome the “VLAN-clean” requirement imposed on cloud site’s switching
infrastructure. These protocols deal with automatic configuration of switch trunks so
that they allow only the VLANs that are needed for the proximate VMs, and not the
whole range of possible VLANs. This is achieved by providing switches with the
ability to be informed with which VLANs are required; this information is provided
by the hypervisors. The protocols are work-in-progress, and need software support
from switches and hypervisors. They are currently not available in implementations.
All these developments indicate that network virtualization and NaaS is an area that
is being developed, both at research and commercial level. It is important to note that
all the presented solutions are either closed-source, in an early conceptual or
implementation stage, and/or require support from switches’ software or hardware.
40 heleos white paper available at http://www.embrane.com/resources/documents/embrane-
architecture-white-paper 41 http://www.opennaas.org/
System Requirements 27
3 System Requirements This chapter describes the purpose and the scope of the system, as well as its key
functionality in the form of use cases. From these, a list of system requirements is
extracted.
3.1 The Network Resource Management System
To address the issues raised in section 2.5, a service is needed that is able to provide
network services by managing all the network “resources”, which are the network
switches, the various means for inter-cloud connections, and the VM gateways. For
this purpose, the Network Resource Management System (NRS) has been envisioned.
NRS is designed to be a service to a cloud platform, responsible for accommodating
networking among virtual machines, either in the same cluster or between different
clusters. However, NRS is not restricted to serving a cloud platform. Any user or
service that needs to provide network connections should find the NRS service
useful.
To accomplish its purpose, NRS has three main functionalities. It can:
1. accept a connection request and correctly map it onto the existing network
infrastructure.
The network request comes from a cloud platform (but is not limited to that).
The request involves one or more (virtual) network interfaces that should be
put in a specific VLAN. NRS should be able to find how the given interfaces
are connected to each other over the physical network.
2. configure the network hardware involved to accommodate the desired
connection.
Once the request is mapped to the physical network, the switches that
connect the network interfaces to each other should be configured to allow
traffic from the VLAN of the request
3. negotiate and manage inter-cloud connections by talking to an NRS located
in a different cloud site.
The envisioned usage of NRS is illustrated by the following example: A user has
requested four VMs and one local network that the VMs should be connected to. The
VMs reside in three different physical machines (VM hosts), which are connected via
switches as shown on the right side of Figure 16. The left side of the figure represents
the logical isolated network that the user should experience (connected with dashed
lines).
System Requirements 28
NRS will:
1. Receive a request from the cloud platform, asking to connect the four VIFs
(one for each VM) to an isolated network segment. VMs shown with blue in
Figure 16.
2. Find out how the underlying physical network connects the interfaces with
each other. In this example, the connection among all VIFs includes all the
switches shown in Figure 16.
3. Configure the devices along the connection, so that an isolated network is
created (When using 802.1Q VLANs, find select an unused VLAN id and
allow it through the network ports in the connection path).
3.2 Use Cases
The basic NRS functionality within a cloud environment can be sufficiently
demonstrated by a few use-case scenarios. They are described in the following
section. Each scenario has an entity sending a request to the NRS. As the NRS is
mainly a service for a cloud platform, that entity would be the cloud platform. That
means that a user of the cloud platform has requested resources from the cloud
beforehand. These resources include virtual machines and network between them.
The cloud platform has accepted the user request and has created its own request to
the NRS to realize the connection between the requested VMs.
3.2.1 Target users
The NRS service is intended primarily as a service to a cloud platform, but it is not
necessarily limited to that. As a network service, it should be usable by any program
or individual that needs to provision network. Besides the user of the provisioning
service, NRS has an additional user, the network administrator. The administrator
should be able to control the behavior of NRS and to request information about
current state of the network(s).
Figure 16: Physical and logical connectivity of VMs
System Requirements 29
Table 6: NRS target users
Name Description
Cloud platform Requests network connections between network interfaces of
Virtual Machines that the cloud platform is serving.
Arbitrary NRS service
user
Requests network connections between any network
interfaces that the user needs to connect.
Network administrator Configures and monitors the NRS service.
3.2.2 Basic request scenario
The simplest usage scenario is that in which simple network connectivity is requested
between virtual machines located in the local cloud site. In that case NRS has to deal
exclusively with configuring the internal network of the cloud site. A sequence
diagram of the request is shown in Figure 2.
For this scenario I have also included the cloud user in order to portray the bigger
picture of the usage scenario. The sequence of steps goes roughly as follows:
1. The cloud user requests a set of VMs, which he would like to be in the same
local area network.
2. The cloud platform picks VM hosts to instantiate the VMs requested and
then provides the NRS with the hosts and the network interfaces that should
be connected in the same private network.
3. NRS finds how the connection can be mapped onto the internal network
infrastructure, i.e. finds a path that connects all VIFs to each other.
4. NRS configures switches along the path (found in 3) to allow a new private
network (allow a new VLAN in the switch ports)
5. NRS informs the cloud platform whether its request was accommodated
Figure 17: NRS basic request usage scenario
System Requirements 30
successfully.
6. The cloud platform can continue its normal operation (launch the VM,
inform the user, etc.).
Once this sequence of actions is done, the user’s VMs should be connected to a
private and isolated network.
This concludes the basic request, and the concept behind it is the basic functional
block of NRS. It is about creating isolated networks and connecting network
interfaces to them. A request can be enriched with various additions, as described in
the following sections.
3.2.3 Internet Request scenario
It is desired that a network request can include Internet connectivity, which means
that the user would want the virtual machines on his network to have access to the
Internet. There are various possible ways to expose a private network to the Internet.
A simple one would be for the network administrator to provide an Internet gateway,
a network node (perhaps a router) that provides Internet connectivity. If then a user
needs Internet connectivity, all that needs to be done is to connect his private network
to the Internet gateway and configure the gateway appropriately. For NRS to
accommodate this request, it needs to be able to connect an existing private network
to a specific node in the network (the gateway). This is similar to the basic request
scenario in section 3.2.2 above.
3.2.4 Quality of Service Request scenario
Another type of request would be one that includes certain network Quality of
Service (QoS) features. The QoS feature that is mostly of interest is bandwidth
guarantees.
To satisfy any QoS request between two VMs, the network path chosen to connect
the VMs must consist of switches that support this specific QoS functionality. This
means that for NRS to acocomodate QoS requests, it needs to know what kind of QoS
is supported by the switches of the internal network, to find a network path that
consists of switches that can support the QoS request, and to be able to configure the
QoS features of the switches appropriately.
3.2.5 Inter-cloud Request scenario
A more complicated scenario is when the network requested needs to contain VMs
that span more than one cloud site, which means that the virtual machines are hosted
in different sites but have to be in the same private network for the end-user. This
scenario can occur in two different cases:
when the original site cannot service the request (possibly due to lack of
resources), and it is desired that it will “fetch” resources from other sites, or
when the user can actually specify on which cloud site he wants each of his
requested resources to run.
Those scenarios require certain negotiation to take place between the cloud platforms
that run on each cloud site. Such negotiation will possibly include resource-related
information and has to include which cloud site will be the initiator of the inter-cloud
connection. It can also be that there is a different mechanism that decides there is a
need for an inter-cloud connection. The NRS’s part in it is to accept the inter-cloud
request from one cloud site that chooses to initiate the inter-cloud connection and
make the connection happen. This is the assumption of the inter-cloud scenario. The
negotiation mentioned here is not part of the NRS project.
After the request is received from a cloud platform, NRS needs to be able to create
and manage a connection that bridges two different local networks (one for each site)
and to negotiate with the opposite side’s NRS on the connection specifics. There are
System Requirements 31
different types of connections that can be used for inter-cloud (VPNs, lightpaths), and
NRS should not be restricted to one technology. For more see section 2.6 above.
The sequence of actions that would accomplish the inter-cloud scenario is shown in
Figure 3.
In this scenario, the cloud platform will request the creation of an inter-cloud
connection to a specific site (site #2). The connection is directly related with a local
private network: This network is to be bridged to a network in site #2 over the inter-
cloud connection. The sequence of actions to make this happen would be:
1. Cloud platform #1 requests a connection to site #2 and specifies the type of
connection.
2. NRS #1 connects to NRS #2 and negotiate the connection: This includes
determining whether both sites support the chosen connection type and
exchanging configuration details.
3. Each NRS configures its end-point for the inter-cloud connection. After this
the connection is live.
4. A response is sent to each cloud platform (or to the requester).
Once the sequence is successfully complete, the two networks are bridged. That
means that they forward traffic to each other as one Ethernet domain. The networks
may still be managed separately by their respective cloud site’s NRS and cloud
platform.
This scenario introduces some security concerns: The communication between the
two NRS' and the traffic sent over the connection needs to be secure.
Figure 18: NRS inter-cloud scenario
System Requirements 32
3.3 NRS Logic and Interfaces
To accomplish the above scenarios, we can see that the NRS system has certain
requirements, which can be categorized in three main functionalities. NRS needs to
be able to
1. Find (or decide) how a network request can be fulfilled. To do that, NRS
needs a software representation of the internal network topology and the
ability to map a network request on it by applying logic rules or algorithms.
2. Know how to configure internal network hardware devices to create isolated
networks.
3. Negotiate an inter-cloud connection with a different cloud site's NRS and
manage the connection.
To perform the above, NRS needs to interface with various distinct components. It
needs to
1. Provide an interface to the requester of the NRS service. In the case of a
cloud platform being the requester, a modification or extension of the cloud
platform will be needed that allows it to use the NRS service.
2. Provide an interface to the network administrator of the cloud site, through
which the network topology can be inserted or modified, and status
information given. (This interface can also be used for a network discovery
mechanism)
3. Have administrative access to the internal network, in order to configure
devices.
4. Start or listen to connections from different sites' NRS's and control the
means through which the connection is achieved (for example in the case of
VPN, manage a VPN box)
The components with which NRS communicates in a cloud context are shown in
Figure 4. Each part is discussed in the following sections.
The logic and interfaces mentioned in this section comprise the key functionality of
the NRS system. Each one is discussed in the following sections, in the order they are
encountered when a request comes into the system.
Figure 19: NRS in a cloud context
System Requirements 33
3.3.1 Requests and resources in cloud platforms
As mentioned in section 2.4, each cloud platform provides several interfaces and
APIs through which they provide cloud services, and there is hardly any
standardization between them.
However all APIs are requests to provide the same type of resources: VMs and
networks. All cloud platforms use the same concepts. Since the resources are
modeled in a similar fashion, the lack of this API standardization is of no concern for
the purposes of NRS.
3.3.2 NRS interface for requests
After a cloud platform receives a user request, the information relevant to the network
has to be sent to the NRS. The cloud platform will choose which physical machines
are going to host the virtual machines and knows which of their network interfaces
are to be connected to the requested network. The cloud platform needs to pass this
information to the NRS. To receive the request, NRS has to expose an interface for it.
Apart from a cloud stack, someone else may want to use the NRS service, which
could be another application or a human directly requesting a network connection.
These are to be taken into account when designing the interface and deciding on its
form.
3.3.3 Representation of the network
Software model of the network topology
Once NRS receives a network request, certain processing is required for NRS to
determine how to accommodate the request. Essentially, the request needs to be
mapped on a representation of the existing network (the network available to the
cloud service). In order to do that, NRS needs to maintain and operate on a software
model of the topology of the network. This topology needs to contain static
information about switches and hosts, their inter-connections and special features of
the devices (e.g., switch capabilities). In addition, it needs to contain dynamic
information to represent which private networks have been formed at any time, which
VLANs are in use, and other relevant information. See section 4.3 below for more on
the topology model.
Topology acquisition
NRS needs to provide an interface through which it can receive the network topology
and accept modifications to it. The simplest way to receive the topology is the
network administrator inserting and modifying it. Automatic network discovery can
also be an option, but one that is 100% complete is not easily achieved, especially
when the network contains devices from different vendors, which often pose
incompatibilities when it comes to auto-discovery. Investigating auto-discovery
further was considered out of the scope of this project.
The topology inserted in NRS should be possible to modify. For example, one may
want to remove some nodes from the network, or to add new ones. The nodes may or
may not be used by existing private networks. Such modifications should not disturb
ongoing operation or affect existing connections whenever possible.
Apart from the topology, the administrator also needs to influence NRS decisions
when it comes to mapping a request and choosing network isolation parameters. For
example the admin may want to restrict how VLANs are chosen or to influence an
algorithm that maps the request (if an algorithm is used that allows it). The admin
interface should provide these options as well.
3.3.4 Network Isolation
One of the requirements is that the private networks created for each user must be
System Requirements 34
isolated from the rest of the network. As discussed in section 2.3.3, this is trivially
achieved with 802.1Q VLANs. This protocol is enforced in switches and guarantees
isolation in the link layer of the TCP/IP protocol stack. Despite its hard limit on
scalability, it is the only universally working method that achieves the goal of
network isolation. As such, NRS will work with 802.1Q VLANs as the main method
of isolating networks. However, its architecture should allow substituting the 802.1Q
VLAN isolation method with a different one (Q-in-Q for example).
3.3.5 Switch configuration
Another requirement is to make the service agnostic of the underlying hardware,
which means being able to configure network switches from a wide variety of
vendors. All switches come with a command-line interface (CLI), through which they
can be configured. It is not straightforward to achieve switch configuration in a
model-agnostic manner, since CLIs are largely vendor-specific, and are often model-
specific among different models from the same vendor. To cope with this variation,
the concept of a switch plugin can be used. Those plugins are associated with a set of
switches that have the same interface (for example same device model or same
vendor family), and have the knowledge of how to perform configuration on their
respective model. NRS can call the plugins to perform configuration. There are
several configuration actions that are required to satisfy network requests. Those
actions can be configuring VLANs (e.g., allow tagged VLAN 3 on Ethernet port 5),
providing QoS, etc. The set of actions will comprise an interface that needs to be
supported by each plugin. Creating a plugin for a specific switch should be as simple
as mapping the specific switch commands to this set of actions.
Switches do not need to be only physical, since the configuration may need to be
applied on virtual switches as well.
CLI Alternatives
Since CLI commands are vendor-specific, there have been attempts at providing
universal device configuration mechanisms. A major such attempt is the Simple
Network Management Protocol (SNMP) [1], which was developed over a decade
ago. This is a protocol aimed to allow for network management in a uniform manner
across all network devices. The protocol is widely used for network monitoring, but it
has not proven to be successful for configuration. The reasons for that are
documented in RFC3535 [16]. There have been attempts at replacing SNMP with
different protocols, but as of today no such mechanism has been widely adopted.
Since using a method different from CLI leads to the same problems (no universal
vendor support, vendor-specific intricacies), configuration through CLI is considered
as the straightforward way to configure switches.
3.3.6 Communication with a different cloud site
In order to initiate cross-site networks, negotiation between cloud managers is
required. This negotiation is about requesting resources, so it needs a common way to
model and request resources between (different) cloud managers. Currently, this
negotiation is not supported by all cloud managers. OpenNebula has the oZones
feature (see section 2.4.1) that can aggregate multiple OpenNebula clouds, which
could make use of the inter-cloud connection feature. In any case, cloud manager
negotiation is out of scope of the NRS project. We assume that the cloud managers
reach a decision on how many resources are going to be hosted on each site, and they
want to get a connection started. This is where NRS comes in.
NRS may receive a request to connect a local network to a network of a different
cloud site. A simple way to do that would be for the other site to have NRS running
as well, to which the NRS of site #1 needs to connect and negotiate connection
information. After the negotiation, both NRS services need to jointly manage the
connection. Using this method, there is no need to know any internal network
information of the opposite site.
System Requirements 35
It is required that the network connection can be hosted on various technologies (see
Inter-cloud Request3.2.5). The NRS is the one in charge of the connection
capabilities: it must have the knowledge of what kinds of connections are possible
and have the ability to manage them.
There is no guarantee that an NRS will be installed on the opposite cloud site. It may
be that the site #2 provisions its resources in its own chosen way, or that it does not
want use the NRS service's way of handling the internal network. In that case, what is
needed is just the logic to negotiate and manage the inter-cloud connection, an end-
point where the NRS of site #1 can connect. For that reason NRS needs to be
available in an “inter-cloud only” mode that has only the cross-site connection
functionality, so that it can be installed on site #2 and perform that part only.
Security Considerations
There are two security concerns with the above suggestion: The first one is how to
verify the identity of the remote NRS. The NRS should make sure it connects to an
actual opposite NRS service and not a malicious entity claiming that it is one. The
second one is to make sure that the information exchange between NRS's is secure,
since it travels over insecure networks.
3.4 Requirements List
The previous sections essentially described the components and the key functionality
of the NRS service. These showed variabilities of the NRS system that have to be
taken into account. These variabilities must be realized by creating modules of
responsibilities and by offered interfaces. They are directly related to non-functional
requirements and have to dictate the software architecture of the system. All
requirements are summarized in Table 7 and associated with an identifier that is used
to refer to them in the rest of the document.
Table 7: Requirements List
Basic functionality:
FR1 NRS should, for a given set of network interfaces, provision network
connectivity among them.
Connectivity within one cluster:
FR2 The network provisioned that way should be isolated from the rest of the
networks existing on the network infrastructure.
FR3 NRS should take into account VLAN id restrictions of network devices when
connecting network interfaces.
FR4
The network administrator should be able to insert the network topology. He
should be able to modify it while NRS is running without requiring a restart
or disturbing existing connectivity.
FR5 The network administrator should be able to set restrictions on which
isolation ids (e.g. VLAN ids) are available to the service.
NFR1
The network interfaces can be in the same computer cluster. In order to
connect them in the same private network, knowledge of the network
topology of the cluster is required. Any arbitrary topology should be
supported.
NFR2 To provision network, NRS needs to be able to configure network switches.
Any switch model should be supported
NFR3 The network isolation method should not be limited to one technology (e.g.,
VLAN only), but should be easily exchanged with any other.
System Requirements 36
NFR4
NRS should provide an interface that can be easily used with any existing
cloud manager (with a plugin), or even a different application that needs the
NRS service.
Connectivity outside the cluster:
FR6 NRS should be able to provide Internet connectivity to a network.
FR7 NRS should provide connectivity to networks that lie in a different cloud site
by negotiating with the NRS service installed there (inter-cloud connection).
FR8 NRS should be usable in a mode that only services inter-cloud connections.
NFR5 NRS should support inter-cloud connections over various means (e.g., VPN,
lightpath).
Advanced functionality:
FR9
Apart from basic network connectivity, NRS should be able to provide more
features over the requested network:
bandwidth
QoS
ACLs
Performance:
NFR6
The amount of time required from the moment of receiving a network request
until its allocation is complete should be deemed “small enough” by the user.
Requesting a virtual machine from the cloud can take from a few seconds to a
few minutes, depending on various configuration choices. The network
configuration overhead of the NRS service should not increase that time by a
substantial amount. As such, an upper limit of 10 seconds to service a request
is considered sufficient.
System architecture 37
4 System architecture A prototype of the NRS service was built for the purposes of this project. This
chapter describes the architecture of the prototype, which consists of modules with
different roles and responsibilities. Each module description is accompanied by class
and sequence diagrams of UML42
. The architecture is the result of several design
choices, the motivation of which is documented as well.
4.1 Prototype Overview
The prototype implements the basic functionality, which provides network
connectivity for network interfaces in the same cluster. The networks provided in this
manner are isolated, i.e. separated from each other. The prototype also supports inter-
cloud connections. The prototype does not have support for requests for bandwidth or
for being able to map requests on the topology in “clever” ways, i.e., it employs
simple path-finding in order to map requests (see section 4.6). The extent to which
the requirements from section 3.4 are satisfied is discussed in Chapter 7.
The prototype was implemented in Python (implementation details in Chapter 5). Its
design is independent from the implementation language as much as possible43
.
The goal for the prototype was to show the feasibility of the basic functionality in a
manner that encompasses all the concepts that were envisioned in the NRS system,
among which are:
1) providing an interface suitable for cloud platforms
2) having knowledge of the network in the form of a topology and being able to
operate on it to infer whether and how connectivity is feasible
3) realizing a connection by configuration of devices
4) negotiating inter-cloud connections
While creating the above, the goal was to explore the design options and create an
architectural basis which can be further developed to add more features. The design
decisions taken for the prototype were based on the requirements.
The architecture is presented in Figure 20 and is arranged in layers which group
together different levels of abstraction. At the top lie the modules that provide the
interface and high level functionality to the service’s users, in the middle lie the
modules that provide the logical model of the network, and at the bottom lie the
modules for hardware configuration. In the diagram of Figure 20, we identify eight
modules with the following responsibilities:
NRS, which implements the interface that exposes the operations available
to NRS service users
inter_cloud, which deals with the inter-cloud negotiation with a remote
service
admin_CLI, which provides the administrator with access to the system at
run-time
request_manager, which deals with the high-level logic of reserving and
42 Unified Modeling Language [17], a modeling language that describes object-oriented
software 43 A few Python features however, have influenced the design: The dictionary data structure,
which is an associative array, is extensively used, and function objects and closures are used as
well. In addition, the fact that Python is not statically typed reduces or eliminates the need to
create (pure) abstract classes that are prevalent in designs of statically typed languages such as
Java and C++. Such classes have been used however to improve the clarity of the design.
System architecture 38
allocating requests
network_isolation, which manages network isolation (such as VLAN
assignments)
topology, which contains the software model of the network topology
algorithms that operate on the topology to infer how network interfaces can
be connected
device_plugins, which deals with device plugins that perform device
configuration
4.2 Service Interface
The NRS interface was decided based on how cloud stacks deal with networking in
conjunction with what NRS wants to achieve. The interface provides operations on
two basic resources: virtual networks and network interfaces.
The virtual network resource is offered by all cloud stacks, and has a unique identifier
known to the cloud stack. The virtual network represents the isolated layer 2 network
segment that can be requested by users of the cloud. Specifically, the cloud users can
request attaching their virtual machines to such a network, which will result in a new
network interface created for the virtual machine, through which the virtual machine
will connect to the network (this is performed by the cloud platform). All the service
calls offered by NRS always contain an identifier that uniquely identifies the network
segment which the user requests connectivity to, as well as the name of the network
interface(s) that are involved in the operation.
The exposed interface operations are shown in Figure 21. The operations can be
conceptually grouped in four distinct functionalities. These are presented in the
following sections as sets of function declarations (omitting types).
Figure 20: NRS layered architecture
System architecture 39
The interface is implemented by the NRS object, whose methods are invoked by the
TCPServer. The server listens to incoming connections for requests, and translates
them into the NRS operations. The implementation of these constitutes the ‘NRS
service’. Implementation details can be found in Chapter 5.
4.2.1 Connecting network interfaces
connect(network_id, network_interfaces, isolation_id=None)
disconnect(network_id, network_interfaces)
These operations will connect/disconnect the given network interfaces to the virtual
network identified by the network id. The network interfaces are expected as (‘host
name’, ‘interface name’) tuples that uniquely identify an interface in the network
topology (see 4.3.2). The outcome of these operations is that the interfaces will
receive all layer 2 traffic from the L2 broadcast domain that is represented by this
network segment. In other words, all network interfaces that are part of the specific
network have L2 connectivity to each other.
The precise way of how the interfaces are “put” in the network depends on the plugin
that performs the configuration for the device. In general, this operation is meant for
interfaces that can receive traffic from multiple virtual networks (trunk ports). For
example, if the network isolation method is 802.1Q VLANs and the interfaces are
switch ports, then connecting the interfaces to the network means that the interfaces
become “trunk ports” for the network’s VLAN id. If the interfaces belong to a host
using Linux bridges, the outcome is a creation of a VLAN device for the specific
VLAN id. For more on plugins see sections 4.7 and 6.2.
Isolation id is an optional argument that is related to how virtual networks are
deployed. If an isolation id is specified, NRS will attempt to associate it with the
virtual network. This argument is meant to be used the first time a network interface
is connected to a network, so that the virtual network is deployed with the given
isolation id. For instance, if the isolation method is 802.1Q VLANs, the isolation id
corresponds to the VLAN id that NRS will attempt to use for the specific network.
This may fail for several reasons (VLAN id already in use, network already
associated with a different VLAN id).
For the calls to be successful, the network interfaces passed to these calls must be part
of the network topology, which means that they must have already been inserted to it
in a previous moment in time (more about the topology in section 4.3).
Figure 21: NRS Service interface operations
System architecture 40
The connect call is internally implemented in two stages that happen in sequence:
First the connection request is reserved in the internal topology, which means that the
resources that make the connection possible have been “reserved” for a subsequent
allocation. Then the allocation will simply perform the configuration of the network
devices (see 4.8 for more on how each step works). In case there is a need for the
service user to reserve resources without performing the actual allocation at the same
time (but sometime later), two additional operations are provided that perform these
two steps:
reservation_id = reserve(network_id, network_interfaces,
isolation_id=None)
The reserve call returns an identifier for the reservation, which can be used later to
allocate it, using the operation
allocate(reservation_id)
To ensure network consistency, each reservation created this way has a dependency
on the previous reservation done on the same network. Each allocate must be called
on reservations sequentially in the order they were created by the reserve call.
Otherwise the allocation calls will not be accepted.
4.2.2 Creating virtual network interfaces
The operations in the previous section can be used to connect network interfaces that
are already part of the network topology, but this is not the case when cloud platforms
are launching or shutting down virtual machines. For each new VM launched, new
virtual network interfaces are created that need to be connected to a network. In order
for them to be connected, they need to be put in the network topology first. Similarly
when a VM is shut down, its network interface(s) needs to be removed from the
topology. Allowing any service user to modify the topology in order to insert new
interfaces is not desired, since the topology modification is an action restricted to the
administrator. To alleviate that, two new calls are provided that are to be used by
users needing to launch new virtual machines and connect them to a network.
vif_connect(network_id, host_interface, virtual_interface,
isolation_id=None)
vif_disconnect(network_id, host_interface, virtual_interface)
reservation_id = vif_reserve(network_id, host_interface,
virtual_interface, isolation_id=None)
The vif_connect call will first create a new virtual machine in the topology with the
virtual interface provided, and then connect it to the host interface. Then it will
connect it to the network with the provided id, as in 4.2.1 above. The
vif_disconnect call will first disconnect the virtual interface from the network and
then delete it from the topology. The vif_reserve is also provided and will return a
reservation id that can later be used to allocate it, identical to the reserve functionality
in 4.2.1.
These calls are meant to be used by cloud platforms or other users or software that
create and delete virtual machines in hosts exiting in the topology.
4.2.3 Inter-cloud connections
This family of operations is related to setting-up and tearing-down inter-cloud
connections.
System architecture 41
start_inter_cloud(local_network_id, remote_network_id,
remote_service_address,
remote_service_port,
connection_type)
stop_inter_cloud(local_network_id, remote_network_id,
remote_service_address,
remote_service_port,
connection_type)
The inter-cloud connection bridges the local network with the remote network over a
connection of the provided type (e.g., VPN). To do that, the local NRS service
negotiates with the remote service that is listening for connections on the remote
address provided. This directly implies that there needs to be a service running on the
remote location that will accept and process the negotiation. In addition, both services
must support the connection method requested. See more on inter-cloud connections
in sections 4.9 and 6.3.
4.2.4 Connecting to a gateway
The last family of operations gives the possibility to connect virtual networks to
gateways, which are network nodes that provide a special service or functionality and
are a static part of the topology. These nodes are associated to unique names that
indicate their functionality. One such node could be an Internet gateway that would
provide a virtual network with Internet connectivity. If the gateway is identified by
the name “Internet”, in order for a virtual network to gain/lose Internet access, the
following operations must be issued.
connect_gateway(network_id, “Internet”)
disconnect_gateway(network_id, “Internet”)
4.3 Topology
The Topology is a software model of the cloud site’s network topology, i.e. the
network of the cluster hosting the cloud service. It contains all the information related
to the network devices and how they are connected to each other. This information
must be sufficient to be able to make decisions on how to allocate network
connections. The NRS administrator can insert or modify this information to reflect a
change in the cluster network.
The model of the topology implemented for the prototype has the basic concepts that
are needed to describe both the static and the dynamic nature of the topology. The
static part is the set of devices that are part of a cloud cluster and the static (physical)
network connections between them. The dynamic part consists of the (logical)
network segments that are formed on top of this static structure.
The modeling of the topology is done in two levels: The first one is the creation of
abstractions that represent the network components, and the second is the modeling
of the components as graphs, in order to facilitate finding connections between them.
This section describes the first level.
4.3.1 Network Markup Language
To describe a topology, the first thing needed is the creation of abstractions of the
basic components of the network devices that are of interest for networking. These
abstractions come in the form of classes and objects. The modeling of the topology
was heavily based on Network Mark-up Language (NML) [17] [18]. NML is an
“effort to define a schema for description of hybrid network topologies.” There have
been various efforts to create data models that describe networks and network paths,
System architecture 42
and NML is trying to converge the data models and pick the best pieces from each
one. In NML, networks are described at the level of intra-network resources, which
means that NML is suitable to describe the network of a cloud cluster at a level that is
of interest to NRS. In addition, NML is an OGF44
standard and SARA, which is a
research institute proximate to Nikhef, employs NML collaborators which were
available to talk to. These are the reasons NML was chosen to use for the NRS
topology representation. NML’s data model, with elements and relations between
them, is shown in Figure 22.
Despite its suitability, NML is at an early development stage. Its model and
implementations were changing during the duration of the project constantly and
therefore there were no implementations of the model that could be used in a software
product; NML provides XML and RDF schemas, but no software implementations.
In addition, NML tries to capture any arbitrary network and therefore contains a lot of
details that were of no interest for the network description and functionality required
by NRS. Choosing to follow all NML conventions would lead to a more bloated and
harder to manage NRS model and implementation and would take away its
simplicity, which was essential to be able to produce the NRS prototype in time. As
such, the NML concepts deemed useful were borrowed to create the NRS network
model, and the rest of them were ignored. When reasonable, there is a direct
correspondence between NML objects and objects in the NRS network model. In
other cases, NRS has constructs that are not an explicit part of the NML data model
but can be described by a combination of its objects. In some cases, NML seems to be
missing descriptions for certain concepts (VLANs); these are certain to be added in
the future development of NML. Ultimately, the NRS network model can (or should
be possible to) be described in NML, but it cannot be used to describe any arbitrary
NML network.
4.3.2 Network components
The class model of the topology that contains all network components is shown in
Figure 23. The building block of the network topology in NRS is the Network Node,
which represents a device that can be connected to a network. In the cloud
44 OpenGridForum (http://www.gridforum.org/), an organization that develops Grid standards
which are adopted by the scientific Grid community 45 Image source: Network Markup Language [18]
Figure 22: NML object model 45
System architecture 43
infrastructure setting, these can be virtual machine Hosts and Switches. A Network
Node is a static member of the network topology and has a unique IP address through
which it can be reached by NRS. NRS needs to be able to reach all the nodes in order
to perform configuration, as will be seen in later sections. Each Network Node can
have Ports, an object which represents a network interface of a host, or a port of a
switch. Ports can have a connection to a single other Port, which is represented by the
Link object. The Link does not refer to a cross-connect in the same node, but to what
is commonly referred to as a network link, which is a physical medium (e.g., Ethernet
cable) connecting different devices. Ports that belong to the same Network Node have
unique names among each other (e.g., ‘eth0’). Therefore, a port can be uniquely
identified by its name combined with the name of its node owner.
A Host object can be specialized to be a Virtual Machine. This specialization is
meant to be used when a Virtual Machine needs to be a static part of the topology and
exhibit behavior similar to the rest of the Network Nodes, i.e. have a unique IP
address through which it can be reached and configured by NRS. The usefulness of
such static Virtual Machines can be seen when they are needed to provide a special
functionality such as being gateways to external networks (see a VPN gateway in
section 6.3). The need for this specialization did not come as a need to represent
virtual machines launched by cloud platforms. These virtual machines do not need to
be reachable by an IP address from NRS in the first place, since NRS only attempts to
provide layer 2 connectivity for them.
When virtual machines reside in hosts, the host sees the virtual machine’s network
interfaces with specific unique names (depending on the naming convention, often
seen as ‘vnet0’, ‘vnet1’). The same interfaces are seen with different names from
within the virtual machine (most likely as ‘eth0’, ‘eth1’, etc.). A correspondence of
these names is kept in the Host object with the vifs hash table. This is needed to be
able to connect the virtual machine to any network, as seen in section 4.8. When a
virtual machine is added to a host, its interfaces are connected to a selected host
Figure 23: Basic network components
System architecture 44
interface. That interface will act as the VM gateway.
All the objects mentioned in the previous paragraphs are contained in a Topology
object, which represents the sum of network devices and links that comprise the cloud
cluster. These objects comprise all the static information of the network topology and
are a sufficient basis to describe cloud cluster networks.
Besides the static information there is also the Layer 2 Network object (L2Network),
a logically isolated private network that corresponds to the Virtual Network available
to cloud users. These objects are created and modified dynamically to reflect changes
coming in from user requests. An L2Network has a list of ports, which represents all
the interfaces that are connected to the network. Essentially, the L2 Network is a
‘star’ topology that connects all the ports that belong to it to each other, but without
consideration of its implementation. Ports can belong to multiple L2Networks at the
same time (trunk ports). Each network is uniquely identified by its uid number. The
information on how these L2Networks are logically isolated is noticeably absent from
the object itself, for example there is no 802.1Q VLAN id or something equivalent.
This allows the implementation of the logical isolation of the networks to be replaced
with different ones, without changing the L2Network object or its port membership.
The network isolation information is managed by the NetworkIsolationManager
(section 4.5). For information on how a network is created, assigned a VLAN id, and
populated with ports, see section 4.8.
The objects of the topology have multiple attributes that need to be defined on their
creation, apart from their ports (the additional attributes are explained in the later
sections of this chapter). The need to specify multiple attributes at the moment of the
nodes’ creation makes it beneficial to have them created by Builder objects [19, pp.
97-106], shown in Figure 24. The NodeBuilder has various methods that can
incrementally create NetworkNode objects. In addition, the NodeBuilder’s build()
method is a Factory method [19, pp. 107-116] that defers the actual creation to its
subclasses, SwitchBuilder and HostBuilder, which are the ones that create the
concrete NetworkNode in the form of a Switch or a Host. The builder objects can be
useful to create duplicates of a complicated node, for example a switch with multiple
ports and other settings, greatly reducing code duplication. The NodeBuilder also
uses the “fluent interface” API style [20], which allows for method chaining until the
object is ready to be constructed. For example, one could do
cisco_cata_2960_builder = SwitchBuilder().\
add_ports(‘0/0/’, 1, 24).\
add_ports(‘1/0’, 1, 4).\
add_plugin(plugins.cisco_cata).\
set_alloc_prio(5)
cisco_cata_2960_1 = cisco_cata_2960_builder.\
set_name(‘cata_2960_1’).set_ip(’10.80.80.32’).build()
cisco_cata_2960_2 = cisco_cata_2960_builder.\
set_name(‘cata_2960_2’).set_ip(’10.80.80.33’).build()
This would first specify a switch builder with two sets of ports (0/0/1-24 and 1/0/1-4)
and some additional information (explained in later sections). This builder object can
be used to instantiate multiple switches later. The actual switch object is not created
until the build() is called, which allows for flexibility in creating the objects.
System architecture 45
4.4 Graph representation of the network topology
As described in section 4.3, the network topology contains a lot of information; this
information needs to be somehow operated upon to be able to find paths between
ports. Since the devices that are connected to each other form a graph, it is natural to
use graphs from graph theory to describe all these objects and their inter-connections,
and to use graph algorithms to perform any required operation on them. There are
various graph algorithms (shortest path, maximum flow) [21] that are perfectly suited
for the tasks of finding the right connection paths among the network nodes. This
brings us to the graph modeling of the network: Each network node (VM hosts and
switches) was modeled by a graph, in a way that it captures how network traffic flows
through the node and its ports.
4.4.1 Graphs
A graph is an abstract representation of connected objects. It consists of vertices,
which represent the interconnected objects, and edges, which are the links that
represent the relation that connects the objects. Vertices are also called nodes. To
avoid confusion with the network nodes from the topology, we will use the term
vertex to refer to the graph nodes and the term nodes to refer to the network nodes.
An edge must always have two vertices as its endpoints (i.e., an edge cannot exist
with an unconnected end). An edge can be directed, which means that the relation
that connects the objects is asymmetric, and undirected, which means that the relation
is symmetric. An edge may have a weight associated with it, which is a number that
quantifies the connection that the edge represents in the context of a chosen semantic,
e.g., the weight can have the meaning of distance, or difficulty to ‘cross’ the edge.
Weights are useful when trying to find paths between vertices. A path between two
vertices exists if there is a sequence of edges, each edge starting from a vertex that
the previous edge left off, that starts from one vertex and ends at the other.
In networks, the unit through which network connections are achieved is the network
interface, or port. The goal is to find “connections” between the ports, which are the
endpoints of the connections, so it seems natural that ports should be represented by
graph vertices and “connections” by edges. More precisely, an edge between two
vertices implies that the ports can have layer 2 connectivity to each other, possibly
after a certain configuration takes place. This is used to find paths between ports; if
there is an edge between two ports, the “physical” link between the ports is already
there, and the ports can be put in the same network by just applying certain
configuration. The edges are undirected; this represents that we are dealing only with
Figure 24: Network Node Builders
System architecture 46
full-duplex traffic.
Switch graph
These choices are better illustrated if we look how each network node is mapped to
graphs. Figure 25 shows the graph for a switch. A switch consists of a set of ports
that have connectivity to each other through the switch “backplane” (if they belong in
the same VLAN). The backplane is represented by an extra vertex, the “backplane”
vertex. There is an edge between every vertex that represents a switch port and the
backplane vertex. That way, all switch ports can be connected to each other through
the path that passes from the backplane vertex.
Host graph
The graph for the virtual machine host in Figure 26 looks similar to the switch graph,
although it represents slightly different concepts. A host can have a number of
network interfaces, unconnected to each other. The interfaces are not connected
following the assumption that if you need more than one host interface for network
provisioning, the second interface is most likely there to connect to a separate
network from the one that the other interface connects to. (This is not the case of
interface bonding; bonded interfaces should appear as one in the graph model, see
chapter 7 for more). Typically hosts are created with only one interface, which will be
used for providing network to virtual machines.
A host can also have virtual machines, which have their own network interfaces.
These are connected to the host’s interface which plays the role of the VM gateway
for the specific VM. The edges connecting the host to the VM interfaces represent the
connections that are possible through the virtual switch installed on the host. In
Figure 26 for example, the VMs’ interfaces are connected to the eth0, which is the
VM gateway for this host.
The exact way that the VM’s interfaces are connected to the host’s interfaces depends
on the chosen virtual switch implementation for the specific host. That is why we
chose to model it as a simple direct edge and hide the actual connection method as an
Figure 25: Switch graph model
Figure 26: VM host graph model
System architecture 47
implementation detail of the host configuration plugin. For more on host plugins see
section 6.2.
Topology graph
The switch and host graphs represent standalone network devices, unconnected to
others. To represent a device’s ports connected to each other, the device graphs are
contained in a larger graph, the graph of the Topology. This graph also includes edges
between different network devices, which represent a physical connection (commonly
an Ethernet cable) between the ports of different network devices. These edges are
created when the Topology object’s attach function is called. An illustrative
Topology graph is shown in Figure 27.
The topology in Figure 27 consists of two switches and three hosts. The switches are
identical. Each host has one interface, which is connected to a specific switch port,
and one host has a virtual machine. This topology corresponds to the object diagram
of Figure 28 and would be instantiated with the following code block:
#Create four identical hosts host_b = HostBuilder().add_port("eth0")
host1 = host_b.set_name("host 1").build()
host2 = host_b.set_name("host 2").build()
host3 = host_b.set_name("host 3").build()
vm = host_b.set_name("vm").build()
#Attach the last as a vm and to host3’s eth0
#host 3 should see vm’s eth0 as vnet0
host3.add_vm(vm, 'eth0', [('vnet0', 'eth0')])
#Create two switches
switch_b = SwitchBuilder().add_ports(‘0/0/’, 1, 24)
Figure 27: Topology graph
System architecture 48
switch1 = switch_b.set_name("switch 1").build()
switch2 = switch_b.set_name("switch 2").build()
#Create and populate the topology top = Topology() top.add_host(host1).add_host(host2).add_host(host3).\ add_switch(switch1).add_switch(switch2) top.attach(host1.port("eth0"), switch1.port("1/0/1"))
top.attach(host2.port("eth0"), switch1.port("1/0/3"))
top.attach(host3.port("eth0"), switch2.port("1/0/1"))
It can be seen that, unlike a Link, the cross-connects in the same network nodes (e.g.,
switch ports being connected to each other, or host interfaces connected to virtual
machine ones) are not explicitly modeled. Rather, they are inferred from the graph
models. As the prototype was being developed, a pragmatic approach was used to
create the topology design, and during the iterations it went through, there was no
need to model cross-connects explicitly (see Chapter 9 for the project development
timeline). See Chapter 7 for a discussion on the link concept when it comes to
bandwidth provisioning.
L2 Network graph
The L2Network graph is populated dynamically as ports are added or removed from
the L2Network. This graph has no pre-defined static structure. It is a subgraph of the
topology, it can contain vertices that belong to different network nodes, and it is
always a tree, which means that it is acyclic (see more on network consistency in
section 4.6.4). In Figure 29 you can see an L2Network graph, overlaid on top of the
topology graph from Figure 27. Multiple L2Networks can coexist independently on
top of the topology.
Figure 28: Object diagram of the topology of Figure 27
System architecture 49
4.4.2 Graph classes
Each network node graph is represented by its own class in the class model. This is
required because the graphs are constructed and populated with vertices in different
ways depending on what type of node they represent. Each time a network node is
instantiated, or a port is added to it, its corresponding graph object is properly
updated as well. The graphs were not implemented as part of the existing network
node classes to avoid strong coupling.
The library used to implement the graphs is networkX [22], a Python graph library. It
was chosen based on its popularity, its intuitive interface and the fact that it provides
a wealth of graph algorithms. The network node classes do not have a direct
dependence on the library, but to a graph object interface that implements a few basic
graph functions, such as add_node(), add_edge(). This makes it possible to replace
networkX with a different library if required, by wrapping the new library’s graph
objects around these functions, although this would mean that the algorithms should
be replaced as well (see next section). The new graph classes are shown in Figure 30,
where you can also see a few details on how graphs are populated when the network
node methods are called.
There is one more addition to the graph classes, added to overcome what could be
considered a limitation of the chosen graph library. The library allows to group a
subset of vertices and edges of a graph into a new graph (a subgraph of the original),
but the graph and the subgraph can maintain no relationship to each other.
Theybecome two independent graphs, and changing membership of vertices or edges
in one of them is not reflected to the other one. In the case of the topology, the
topology graph is a graph containing the smaller graphs of hosts and switches (which
are its subgraphs). Every time one of these network node graphs is modified (new
ports added, for example), the topology graph needs to be updated. Therefore, the
observer pattern [19, pp. 282-292] is used to notify and update the topology graph
each time one of its network nodes is modified. The pattern as applied to the graph
classes is shown in Figure 31.
Figure 29: L2Network graph (overlaid with red)
System architecture 50
The representation of network objects with graphs allows the straightforward usage
of graph algorithms to achieve finding paths among them. Before moving to the
algorithms however, there is an additional module that needs to be explained and that
is used by the algorithms. It is the network isolation module, which manages how L2
networks are logically isolated, and it is presented in the following section. The
algorithms are presented in the following section.
Figure 30: Graphs in the class diagram
Figure 31: Observer pattern
System architecture 51
4.5 Network Isolation
The L2 networks formed on top of the topology need to be logically separated from
each other, and the network isolation module represents the mechanism that
implements this isolation. To isolate networks, a piece of information needs to be
associated with each network, such that when it is used to configure devices, it results
in isolated networks. VLAN identifiers fit with this concept perfectly. In order to be
able to replace such an isolation mechanism with a different one (e.g., replace the
802.1Q VLAN ids with the Q-in-Q id tuples), the isolation information is not part of
the network object itself, or present anywhere else in the model, except in the
NetworkIsolationManager object. This object maps Network objects to specific
isolation information, and it is the only object with this knowledge. The
NetworkIsolationManager class is an abstract class; to support an isolation
mechanism, an implementation has to be created. The NetworkIsolationManager
exposes methods to reserve or release isolation information from networks.
Implementations of the abstract class should also provide methods for an
administrator to control its parameters or assignment mechanisms; these are
implementation specific.
The implementation of 802.1Q VLAN isolation is done in the VlanManager class,
where the isolation information being associated with each L2 network is an integer,
the VLAN id. The VlanManager deals with assigning unique VLAN ids to networks
and with managing the available VLAN id pool, where ids are reserved from or
released into. VLANs can be assigned either to network objects identified by their
network id, or assigned to a unique string that identifies that the VLAN is reserved
for a specific purpose (e.g., typically in computer clusters, a VLAN id is reserved for
a VLAN where management of computers and switches is performed. This is done
for security reasons and the network is commonly called the management network).
The string does not serve any purpose other than to remind the administrator of what
the VLAN is reserved for, and it does not occur anywhere else in the model. A few
illustrative VLAN manager operations follow:
reserve_isolation_id(2) #associate an unused VLAN with network with id 2
reserve_isolation_id(3, 5) #(try to) associate VLAN id 5 with network 3
reserve_named_isolation_id(“management”, 1) #associate VLAN 1 with “management”
The class diagram of the network isolation module is shown in Figure 32.
System architecture 52
In Figure 32 you can see references to ‘restrictions’ and ‘restrictors’. This is the
mechanism used for imposing restrictions on available or allowed VLAN ids (or
isolation info in general) for the NRS service to use. This may be useful to an
administrator that wants to prevent NRS from using an arbitrary set of VLAN ids.
Since VLAN ids are integers, the restrictions are implemented as function objects,
that operate on one integer argument (the VLAN id) and return a Boolean, which
conveys if the specific VLAN fulfills the restriction or not. To illustrate46
, the
following statements
restrictor1 = lambda i: i <= 1024
restrictor2 = lambda i: i != 6
vlan_manager.set_isolation_restrictions([restrictor1, restrictor2])
will make sure the VLAN id is lower than 1024 and not 6.
The same VLAN id restriction mechanism is used for NetworkNodes. It is often the
case that a legacy switch model will not support the full range of 4094 VLANs that
the standard supports, but a lesser number. This can be modeled by having
NetworkNodes keep a list of such VLAN id constraints, as seen in Figure 32. Since
NetworkNodes do not have a dependency on VLANs, they keep a dictionary of
isolation constraints that maps isolation types to a list of restrictors.
The information provided by this module is used by the algorithms, the device
plugins and the request manager, as seen in sections 4.6, 4.7, and 4.8 respectively.
4.6 Algorithms for operations on the topology
The reason for maintaining a network topology is to be able to operate on it to infer
how network interfaces can be connected or disconnected from the L2 networks
46 Python lambdas are anonymous function objects.
Figure 32: Network Isolation with VLAN ids
System architecture 53
overlaid on the topology. There are two main operations that are needed: One is
finding how a set of network ports can be connected to a network. This is requested
by a user of the service, and his request is mapped on the topology by finding the
proper path(s). The algorithm that finds the paths is called the growing algorithm,
since it “grows” the network. The second operation is removing ports from the
network segment when the user requests so. After their removal, it is desirable to
garbage collect all the network ports that are now redundant in the network segment.
This operation is performed by the shrinking algorithm.
4.6.1 Network growing algorithm
The growing algorithm operates on the topology graph to find paths among network
ports. It is used after a request arrives to connect a port (the source port) to a specific
L2 network; for example, using the calls from section 4.2.1 one such request would
be
connect(5, “vm1”, “eth0”) #connect eth0 of vm1 to network with id 5
The network may already have ports, and the goal is to find one path that connects
the source port to one of the ports already in the network. The network graph is a tree,
so the algorithm will effectively look to add a new branch to the tree that ends to the
source port. The branch added is the shortest possible. The length of paths is
determined by the edge weights, which is a value attached to all edges of the topology
graph, with a default value of 1. The NRS prototype does not utilize weights, so that
the paths found are the ones with the fewer number of edges.
The mapping operation needs to take two additional factors into account:
1) The static isolation constraints of the network nodes. The network has an isolation
identifier (a VLAN id) that may conflict with isolation constraints of nodes.
2) The max runtime isolation ids (VLANs) that a network node can support. If this
number has been reached, the network node must be temporarily unreachable for the
mapping operation (see section 4.8 for more on that).
This is why before the topology graph is processed to find paths, it is pruned of all the
nodes whose isolation constraints conflict with the chosen network segment’s
isolation id, as well as the nodes marked as unreachable. This makes sure that the
path found consists of nodes that can accommodate the isolation id.
The growing algorithm receives the topology graph, the source port, and the target
ports (the ports of the network segment) as input. It then performs the following
steps:
1. If the target ports are empty (network is not populated), add the source port
to the path and go to step 5.
2. Get the topology subgraph whose nodes do not conflict with the network’s
isolation id.
3. For each target port: find the shortest path from the source port to the target
port (uses the library implementation of Dijkstra shortest path algorithm).
4. Return the shortest among the paths found in step 1.
5. Collect the additional network isolation constraints from all the devices that
belong in the path.
6. Return the path and the constraints.
A simple application of the mapping algorithm on the topology from section 4.4.1 is
shown in Figure 33. Starting with the target network overlaid in red on the left side of
the figure, an application of the algorithm with the source port (“vm”, “eth0”) would
extend the path as shown on the right side of the figure.
System architecture 54
4.6.2 Network shrinking algorithm
The second algorithm is the shrinking algorithm, used when a port is to be removed
from a network. The purpose of this algorithm is to determine which network ports
are no longer required in a network and can be removed as well. The algorithm will
be called after such a request:
disconnect(5, “vm”, “eth0”) #disconnect eth0 of vm from network 5
As we saw in the growing algorithm section above, to connect a port (the source port)
in a network, a path has to be found that places all the ports that lie along it into the
network. This means that, although only one source port was requested, a set of ports
was actually connected to the network. These ports have to be garbage collected
when the source port is removed from the network, as they serve no other purpose
than to connect the source port to the network.
To distinguish between these two types of network memberships, the concept of
explicit and implicit port membership is introduced. A port is an explicit member of a
network if it was directly requested to be part of the network (through a connect()
service call). The rest of the ports that belong to the network only to provide
connectivity to the explicit ports, are implicit members of the network. A user can
only ask to remove an explicit port from a network (one that he has previously asked
to be put in). Implicit ports should be removed automatically as long as they are not
necessary.
The shrinking algorithm receives as input a port and the network that the port should
be removed from, and works as follows:
1. Mark the port as implicit, create empty garbage_ports list
2. Get the neighbors of the port in the network graph that are not already in
garbage_ports.
3. If number of neighbors is less than two (so the port has either one or no
neighbors), and
the port is not an explicit port, and
the port is not already in garbage_ports:
a. add port to garbage_ports
b. if there is a neighbor, run step 2 with the neighbor as input port
4. Return garbage_ports
The network graph that the algorithm operates on is a tree, and the input port to the
release algorithm may or may not be at the end of a branch. If it is, the algorithm will
effectively prune the branch starting from the input port and ending either at an
Figure 33: Growing algorithm operation
System architecture 55
intersection, or at a new explicit port. (This explicit port needs its own disconnect
operation to be removed). If the input port is not at the end of a branch, which means
that it is an intersection itself, it will only be marked as implicit. If the graph of the
network is a tree, the release algorithm can never split the network graph into
unconnected subgraphs.
If the release algorithm is applied to the graph in the right side of Figure 33 above,
with the port (“vm”, “eth0”) and the network in red color as input, it will produce the
exact reverse operation of the mapping algorithm and will lead to the network on the
left side of the figure.
4.6.3 Algorithms in the class model
The two algorithms are provided as functions from the algorithms module. The
algorithms’ logic relies on functions provided by the graph library and on the
topology classes. Therefore, they are tightly coupled to both the graph library and the
topology classes, but the two couplings can be split into two different modules. The
algorithms_nx module uses the networkX library but has no knowledge of the
network node classes, and the algorithms module uses the topology classes to infer
the information to pass to the algorithms_nx module as graph information, in order
to obtain results. The modules containing the algorithm functions are shown in Figure
34 (“utility” is a UML stereotype that indicates that the class only has static attributes
and operations).
This grouping of the algorithm modules allows for interesting modular properties.
Firstly, the algorithms are exposed to the system through two interfaces, each
containing a function that invokes the corresponding algorithm. The two algorithm
implementations are simple in function, but the algorithm module can easily be
replaced with more complicated one that allows for a twist in the algorithms’ result
(the growing algorithm is mostly of interest for that), for example, using heuristics to
influence how paths are found. This modular behavior is known as the Strategy
design pattern [19, pp. 303-311].
Secondly, the top algorithm module itself relies on two provided functions of the
bottom module (algorithms_nx). The bottom module is tightly coupled with the
chosen graph library and graph representation. It could however be replaced if a
different graph library or representation is chosen, as long as its two exposed
functions are implemented.
The structure of the algorithm modules allows for flexibility in modifying and
replacing them, if one wishes to do so.
System architecture 56
4.6.4 Network consistency
The previous sections describe operations on L2 networks. These operations are used
to manipulate the network graph, as described later in section 4.8. In addition,
administrator interference (section 4.10) may make changes to the L2 networks as
well. The network represents a L2 broadcast domain to all the ports that were
requested to be part of it; these are the explicitly requested ports of the network. NRS
must guarantee that the explicit ports are connected to each other. In turn, this
requires that the graph representing the network is always connected, which in graph
theory means that for every pair of vertices of the graph, there is always at least one
path that connects them. This effectively means that all the ports (the vertices) are
indeed in the network, and the network is not split into two or more disconnected
parts.
Apart from consistency semantics regarding connectivity, the networks need to be
consistent from the point of view of the algorithms that expect them as input. In
particular, the shrinking algorithm works by pruning tree branches from the graph,
and requires the leaves to be explicit ports. If a leaf in the graph is a non-explicit port,
it may never be selected for garbage collection by the use of the shrinking algorithm,
since it is not possible for a user to directly remove a non-explicit port. Therefore, the
leaves should be explicit ports.
The algorithms that are used to modify the networks operate in such a way that they
Figure 34: Algorithm modules
System architecture 57
always create a tree graph, as long as the network graph was already a tree graph.
Tree graphs are by definition connected. However, if the administrator makes
changes, he might end up creating a disconnected or otherwise faulty graph. This is
why the network needs a consistency check after changes are made. There are two
consistency criteria:
1) The network graph should be a tree graph. This makes sure that all ports are
connected and that there are no cycles formed.
2) The leaves of the tree graph should be explicit ports. This makes sure that
the network can be fully removed using the shrinking algorithms.
The consistency check method is shown in Figure 35.
4.7 Device Plugins
To apply changes in the logical topology to the physical one, NRS should be able to
configure devices, in particular network switches and virtual machine hosts. Each
switch vendor and model may accept different CLI commands, and integrating all of
the different CLIs into NRS would be impossible. The concept of the device plugin
alleviates this situation. Each plugin is associated with a specific device (or family of
devices with common CLI) and has the knowledge of how to perform certain
configuration actions on the device.
The required configuration actions have to do with what changes in the topology
NRS supports. Currently, the changes are adding or removing ports from L2
networks, so that the networks can be formed, grown, shrunk and removed (section
4.6). The configuration actions that implement adding and removing ports from L2
networks are entirely dependent on the network segregation mechanism that is used.
When using 802.1Q VLANs, the actions that implement these changes are adding and
removing VLAN ids from trunk ports, or adding and removing access ports to a
VLAN. Therefore the plugins must support a set of commands that can perform these
actions. From a device-agnostic point of view, an interface of supported commands
must be created that, if implemented by a device plugin for an arbitrary device, is
fully sufficient to configure VLANs on the device.
In order to identify the commands and abstract them into an interface, various
hardware and virtual switch models were explored. When it comes to hardware
switches, at least one switch model from each of the following companies had its CLI
explored: Cisco (2 switch models), Juniper (1), Dell (1), 3com (1), Brocade/Foundry
Figure 35: Network consistency
System architecture 58
(3), Nortel(1). While the list is not exhaustive (Arista networks is missing for
example), these companies comprise the majority of switch manufacturers, and it
does not seem likely that another vendor’s CLI would have drastically different CLI
concepts. Besides hardware switches, two software switches’ interfaces were
explored: the Linux bridge and Open vSwitch. Both are open-source software, with
Linux bridge being included by default in most Linux distributions.
The commands identified for 802.1Q VLANs are:
setup (vlan_id, node)
perform setup operations for the node to host the VLAN
cleanup (vlan_id, node)
perform cleanup operations to remove the VLAN from the node
allow (vlan_id, node, port)
adds the VLAN id to the trunk port of the node
disallow (vlan_id, node, port)
removes the VLAN id from the trunk port of the node
Usage example:
#allow vlan id 5 at port 1/0/1 of cisco switch “cata_2960”
cisco_plugin.allow(5, “cata_2960”, “1/0/13”)
Out of these four total commands, two of them are for placing/removing a port in a
VLAN, and the other two are for creating(‘setup’) or removing(‘cleanup’) the VLAN.
The need for these extra two commands exists because most switches need some
setup/cleanup operations when a new VLAN is introduced. This is needed both in the
case of hardware switches (‘create vlan 5’ is a common CLI command that is needed
before ports can start to be added to the vlan) and of Linux bridges (to create and
remove bridges). More about specific plugin implementations can be read in section
6.2.
These commands are relevant in the context of 802.1Q VLANs, but they may need to
be changed to support different isolation mechanisms, depending on how different the
configuration of the other isolation mechanism is (Q-in-Q appears to be pretty
similar). Therefore, each plugin has a list of network isolation mechanisms it supports
associated with a list of commands.
The plugins’ classes are shown in Figure 36. The plugins can be made more modular
with the inclusion of drivers, which deal with the communication method to the
device. Each plugin has a set of drivers registered to it, which it can instantiate. The
chosen communication method for the implemented plugins was CLI over SSH, but it
is possible to use a different communication method and/or protocol, such as CLI
over Telnet, NETCONF, SNMP, OpenFlow or other. Each network device has one
DevicePlugin that uses one DeviceDriver which communicates the commands. Such
drivers can become very device- and protocol-specific. Changing the communication
protocol might even mean that the command themselves have to be different,
essentially requiring a new plugin. Therefore, exploring further design abstractions
related to the drivers was not deemed useful.
Each network node must have a device plugin associated with it. For ease of use, the
plugins can be instantiated by the plugins module, which acts as a façade [19, pp.
174-183] for all the available plugins (shown in Figure 37).
System architecture 59
As a proof-of-concept, the NRS prototype has two implemented hardware switch
plugins (one for Brocade fastIron switch family and another for 3com 4210 model)
and two software switch plugins (Linux bridge and Open vSwitch) All of them use
SSH drivers. You can read more on the implementation and deployment of the
plugins in sections 5.2 and 6.2.
Figure 36: Device Plugins class diagram
Figure 37: Device Plugin instantiation
System architecture 60
4.8 Request Manager
The request manager is responsible for receiving requests to connect or disconnect
interfaces, and uses all modules described in the previous sections to realize these
requests. It makes the appropriate calls to the network isolation module, the topology
and the algorithms to perform its operation.
A request to connect an interface to a network is implemented in two distinct steps, a
reservation and an allocation step. The two steps are different in nature: The
reservation is a logical operation on the topology; it reserves new nodes that are
needed for the request and guarantees that they are available to the requester; it must
be an atomic and fast operation on the logical topology, but performs no actions on
the devices. The allocation on the other hand is performed on already reserved
resources; the configuration of each device that corresponds to the reserved nodes
takes more time than the logical topology operation, but the allocation does not use
the logical topology and does not need a lock on it. Apart from this distinction,
having separate reservation and allocation steps can be useful to a cloud service that
performs these steps at different times (for example, the reservation step could be
done in the VM scheduling phase, while the allocation can happen later when VMs
are actually launched. These two moments could be fairly distant from each other,
depending on how the cloud platform is implemented and on the load that its service
is experiencing).
A request to remove an interface from a network is implemented in two steps as well,
but the operations are not provided separately to the interface. The process is
described in the following paragraphs.
4.8.1 Reservation
The reserve operation first calls the growing algorithm to add the requested network
interface to the network, and then reserves resources. The operations steps are
outlined below (also seen as a sequence diagram in Figure 38):
reserve(network_id, network_interface)
1. Retrieve the isolation_type used and the isolation_id for the network from
the NetworkIsolationManager
2. Call the growing algorithm to find the path with the new ports to be added to
the network
3. If the network is new, ask the NetworkIsolationManager to associate an
isolation id with the new network
4. Reserve the ports of the path to the target network. The port given in the
reserve call is marked as “explicitly reserved” and the rest of the ports as
“implicitly reserved”
5. Increase the ‘l2_networks’ and ‘isolation_type’ resource usage on the nodes
that own the path’s ports
6. Create and return a new reservation
When a reserve operation is successful, it guarantees that the network resources
reserved will be available for allocation. The outcome of the reserve operation needs
to be allocated so that the connections actually take place. The information that is
needed for this allocation is encapsulated in a Reservation object, which is used in the
allocation step. The class diagram is shown in Figure 39.
System architecture 61
Apart from adding ports to the network, the reserve operation needs to take into
account a few additional pieces of information that concern network nodes. One is the
maximum number of concurrent VLAN ids that a network node can support
(requirement FR3). There needs to be a way to count every VLAN id added to the
node, so that if the maximum number is reached the node becomes unavailable for
further reservations. The second is the fact that every node needs to perform some
VLAN setup and cleanup actions (as shown in section 4.7) the first time a VLAN id
is added and when it is removed from the node.
This information is modeled as network ‘resources’ maintained in the nodes. Every
node has a resources dictionary that maps a unique string that identifies a resource to
an arbitrary value or object that quantifies it. In this case, the two resources are the
‘isolation_type’ and ‘l2_networks’, both of which use simple counters. The
‘isolation_type’ counts different isolation ids that exist on the node for the chosen
isolation type, thus counting the number of VLAN ids when 802.1Q VLANs are
chosen. When the node reaches the maximum, the node marks itself as unreachable,
which is used to temporarily exclude it from the topology graph in step 1 of the
growing algorithm (section 4.6.1). The ‘l2_networks’ counts the number of ports for
each different network that the node belongs to. When the port count first starts, it
marks the ‘setup’ plugin command to be called in the next allocation, and when the
count reaches zero the ‘cleanup’ command is marked.
The resources concept is meant to be used for any values or information related to
network nodes that have reservation semantics and need to be taken into account
when the reservation operation takes place. It is a rather simple concept; it needs to be
extended if the ability to reserve bandwidth is introduced to the system (also see
Chapter 8).
Each reserve operation calls the growing algorithm and increases the reserved ports
of the network. If multiple reservations are made, each reservation “builds” on top of
Figure 38: Reservation sequence diagram
System architecture 62
the previous one. If these reservations are not allocated in the order they were made,
the allocated network can end up inconsistent. Therefore, if there is a pending
(unallocated) reservation for a network, a new reservation will require the first one to
be allocated before it can be allocated itself. This creates dependency links between
pending reservations for the same network, with each one depending on its previous
one. This dependency is modeled in the ReservationManager, an object that
maintains the reservations in a queue, one for each network, and allows only the first
reservation in the queue to be allocated.
4.8.2 Allocation
The allocate operation configures network devices to allow traffic from L2 networks
through specified ports. The operation iterates on the ports that belong to a
reservation, and invokes plugin commands as required to configure the network
devices. The allocation steps are the following (also seen as a sequence diagram in
Figure 40).
allocate(reservation_id)
1. Retrieve the reservation and the the isolation id associated with the network
contained in the reservation
2. Sort all ports contained in the reservation according to their owners’
allocation priority
3. Iterate on the sorted list of ports to call their owner’s device plugin’s ‘allow’
command.
The allocation priority is an integer associated with each network node. When
compared with a second node, this number specifies which node will be allocated
first (the lower the number, the highest priority). This is introduced to aid with
situations where certain network nodes need to be configured in a specific order in
order for the overall configuration to be successful.
Figure 39: Request manager class diagram
System architecture 63
After an allocation is complete, the Reservation object has been deleted and the next
Reservation for this network (if any) is available to be allocated.
4.8.3 Release
When a port is requested to be removed from a network, first the shrinking algorithm
is used to determine how the network will change and collect the ports that can be
garbage collected. Then these ports are de-allocated to undo any configuration that
allows traffic from the network to reach the ports. The steps of the operation are
shown below (a sequence diagram is shown in Figure 41).
release(network_id, network_interface)
1. Call the shrinking algorithm to collect any ‘implicit’ member ports of the
network that can be garbage-collected.
2. Iterate on the ports to invoke the device plugins’ ‘disallow’ command.
3. Remove the garbage ports from the network.
4. Reduce the ‘l2_networks’ and ‘isolation_type’ resource usage of the port’s
nodes
5. Check if the network is empty. If it is, release its VLAN id to be usable by
new networks.
Figure 40: Allocation sequence diagram
System architecture 64
4.8.4 Intra-cloud functionality
The reserve, allocate and release operations described in the previous sections
conclude the functionality of the request manager. All the modules described until
this section are used in the context of these three operations. In turn, the request
manager’s functionality implements the functionality that the NRS system offers in
network connectivity within one cluster, also referred to as the intra-cloud
functionality, and related to requirements FR1-FR5. The interface operations from
sections 4.2.1 and 4.2.2 are realized with calls to the request manager. Figure 42 has a
sequence diagram that shows how the connect operation uses the request manager.
This section concludes the intra-cloud functionality. The following sections describe
the support for connectivity to external networks, and the administrator access
interface.
4.9 External networks and inter-cloud
Moving outside of the local cloud site, there are two use-cases that connect local L2
networks to external ones. One is connecting to an external network which is ‘out
there’ and beyond local administrative control; the Internet fits this description. The
second one is bridging a local L2 network with a remote one belonging to a specific
remote site, over an external network. This inter-cloud connection is fully controlled
by the two sides that create it.
Figure 41: Release sequence diagram
Figure 42: Connect sequence diagram
System architecture 65
From a local point of view, in both cases local L2 networks need to be connected to
external networks; the local end-point of an inter-cloud connection is a connection to
an external network as well. The NRS system requires each of these external
networks to be associated with a special gateway that is able to provide connectivity
to the external network. Connecting a local L2 network to an external network is then
reduced to first internally connecting the L2 network to the gateway, and then
configuring the gateway appropriately. Inter-cloud connections use the gateway
concept as well. Each type of inter-cloud connection needs a local gateway that is
able to set up and tear down the connection; the L2 networks will connect to that
gateway.
4.9.1 Gateway nodes
A gateway to an external network is just another network node contained in the
logical network topology; it can be any of the three node types (switch, host, or a
static virtual machine) that is best suited to represent it. Each gateway has a unique
name that identifies it; the topology maintains an association of the gateway names
with the ports of the gateway network node objects; each gateway is accessible
through a port.
The sequence of actions to connect a network to a gateway re-uses the mechanisms of
reservation and allocation that are used to connect any regular node to a network.
When a request such as
connect_gateway(2, “Internet”) #connect network 2 to the internet
is received, first the port of the ‘Internet’ gateway node is retrieved from the
topology, and then passed to a reserve operation to the request manager, followed by
an allocation. The reserve operation reserves a path from the gateway to the existing
L2 network, in exactly the same way as ports of regular nodes are added to a network.
The allocation corresponds to the gateway-specific configuration that needs to be
performed to connect the L2 network to the Internet (for example, modifying firewall
rules on the gateway). Gateway-specific plugins need to be associated with the
gateway nodes that can perform the configuration.
A sequence diagram of connecting a network to a gateway is show in Figure 43. The
operation fits well with the intra-cloud functionality of the system and is performed
with few actions, re-using the existing modules’ operations. This functionality
satisfies requirement FR6.
Figure 43: Connecting to a gateway sequence diagram
System architecture 66
4.9.2 Inter-cloud connection
An inter-cloud connection’s purpose is to bridge two L2 Networks that reside on
(possibly) remote cloud sites, each of them managed by its own NRS service. Unlike
the functionality that has been described so far, which can be dealt with locally, an
inter-cloud connection is jointly created by two remote NRS services. An inter-cloud
connection request starts with one NRS service making a connection request to a
remote NRS. A negotiation ensues that determines whether there is an agreement on
the specifics of the requested connection. There are two things that need to be
ensured: 1) both NRS services need to support the same type of inter-cloud
connection (e.g., both need to have VPN gateways configured) and 2) both sites need
to allow the requested L2 networks to be connected over an inter-cloud. The last
point implies administrative control over which local networks can be used in inter-
cloud connections. After the negotiation is complete, some information may have to
be exchanged that is specific to the connection type (e.g., VPN configuration details).
The inter-cloud negotiation is implemented using message passing. A complete
negotiation is composed of a sequence of different message-exchanging operations,
each of them dealing with exchange of a specific piece of information. Each such
operation has two counterparts: One corresponds to the side of the initiator (or client)
and the other to the side of the receiver (or server). The message passing operations
have transactional semantics, i.e. information is exchanged, each side checks the
received information against internal constraints, and verification of the outcome is
exchanged. Both sides must complete their parts successfully for the negotiation to
have an effect on either side. The classes that represent these concepts are shown in
the class diagram of Figure 44.
In the upper part of the diagram of Figure 44 lie the classes that compose the building
blocks of the negotiation and deal with the implementation of message passing. The
NrsNegotiation class represents a complete negotiation. Its handle method is an
abstract method that needs to be implemented to perform the implementation-specific
exchange of information. The method returning successfully means that the
negotiation was successful. NrsNegotiation provides two methods for sending and
receiving messages. These methods are provided to NrsNegotiation by the
MessagePassingImpl interface, which must deal with the implementation of the
message exchange. The implementation for the NRS prototype is the NrsSocket,
which wraps over a low level TCP socket to synchronously exchange messages.
Subclasses of NrsNegotiation must be created to perform concrete negotiations.
Since message passing is not a symmetric relationship, but one side needs to start a
conversation (client) and the other to receive it (server), at least two classes need to
be created that implement the client’s and the server’s conversational behavior. These
subclasses need to implement their own conversational protocol.
System architecture 67
The custom conversational protocol created for the inter-cloud negotiation is simple
and to the point. There are two different conversations that the initiator of the
conversation (the client) may start, which correspond to requesting a new inter-cloud
connection, or stopping an existing one. These are represented by the
ClientStartInterCloud and ClientCancelInterCloud classes. The first message
sent by the initiator must be the identifier of the imminent negotiation, which the
receiver must recognize before the chosen negotiation proceeds. The receiver of the
conversation (the server) needs to start a negotiation before knowing which
conversation will take place. This is modeled as a ServerNegotiation class that has
different states, with the current state determining the conversation taking place. This
behavior is known as the State design pattern [19, pp. 305-314], which allows an
object to partially change its state at runtime. The ServerNegotiation class has a list
of two possible states, the start state and the cancel state, that correspond to the
ServerStartInterCloud and ServerCancelInterCloud classes. These two classes
implement the two different conversations from the side of the server. The
ServerNegotiation first receives the negotiation identifier, and then sets the state
accordingly and proceeds with the chosen negotiation. The actual information
exchange that takes place in the two different negotiations between client and server
is shown in Table 8 and Table 9.
Figure 44: Inter-cloud negotiation class diagram
System architecture 68
Table 8: Starting an inter-cloud negotiation
Information context Client actions Server actions
Negotiation identifier send negotiation identifier for
StartInterCloud
receive negotiation identifier
receive confirmation/rejection send confirmation/rejection
Connection type (e.g.,
OpenVPN)
send connection type identifier receive connection type
identifier
check if connection type is
supported
receive confirmation/rejection send confirmation/rejection
L2 Network identifiers send local network id receive remote network id
send remote network id receive local network id
check if local network id is
allowed for inter-cloud
receive confirmation/rejection send confirmation/rejection
Connection specific
information (for
OpenVPN)
VPN role (server or
client)
send VPN server preferences receive client’s VPN
preferences
decide on who becomes the
VPN server
receive VPN role send client’s VPN role
check if VPN role assigned
agrees with VPN settings
send confirmation/rejection receive confirmation/rejection
VPN configuration send local VPN configuration receive remote VPN
configuration
receive remote VPN
configuration
send local VPN configuration
Table 9: Cancelling an inter-cloud negotiation
Information context Client actions Server actions
Negotiation identifier send negotiation identifier for
CancelInterCloud
receive negotiation identifier
receive
confirmation/rejection
send confirmation/rejection
Identifier of the inter-
cloud connection to be
cancelled
send local network id receive remote network id
send remote network id receive local network id
System architecture 69
The columns of actions in Table 8 and 9 are sequences of actions; the client (or the
server) execute them sequentially from top to bottom. The message passing is
synchronous and the client and server always synchronize at the send/receive action
pairs. Thus, server and client actions that are in the same row can be considered to
happen simultaneously. At the confirmation/rejection actions, if the outcome is a
rejection both sides will finish the negotiation with a failure indication. The actions
for each different information context are grouped in the operations of the negotiation
classes in Figure 44.
There are various configuration options that are associated with each side of the
negotiation, such as connection-specific preferences and available connection types.
These are passed to the negotiation classes when they are instantiated, which happens
when an inter-cloud request is initiated towards a remote NRS service, or when a
request is arriving from a remote service. The NRS object is the one that instantiates
the negotiation objects to start or stop inter-cloud connections, and maintains the
configuration settings. The negotiations are instantiated after a request such as
start_inter_cloud(2, 3, nrs2.nikhef.nl, 10003, ‘OpenVPN’)
#bridge local network 2 with remote network 3 located at the
#service at nrs2.nikhef.nl:10003. Connection type OpenVPN.
NRS also maintains a list of existing inter-cloud connections. When a request to
cancel an inter-cloud comes in, NRS checks the information against the list of
existing inter-cloud connections to identify which connection to tear down.
Once a Negotiation is complete, all is left to do is to connect the local network to the
inter-cloud gateway, as described in section 4.9.1. The configuration of the gateway
will be performed by the gateway’s plugin, afther which the inter-cloud connection
will be ‘live’. A sequence diagram showing the whole process of the creation of an
inter-cloud connection is shown in Figure 45.
The NRS instance on the left side of Figure 45 represents the ‘local’ NRS service,
while the one on the right side is the ‘remote’ service. It is worth noting that the only
expected functionality from the remote service is to converse correctly during the
negotiation. The manner by which the inter-cloud connection is actually realized on
the remote site is of no concern to the local NRS, as long as it works for the end-user
of course. This means that, if NRS is stripped of all functionality that is not necessary
Figure 45: Inter-cloud sequence diagram
System architecture 70
for inter-cloud negotiations, it can be used as an ‘inter-cloud only mode’ that can
properly converse with any remote (full-fledged or not) NRS service.
4.9.3 Inter-cloud mode NRS
NRS can be configured to run in an ‘inter-cloud’ only mode (requirement FR8). In
this mode, the NRS service can perform only the inter-cloud start and stop family of
operations (from section 4.2.3), putting aside all other functionality related to internal
reservations and allocations. The inter-cloud operations themselves perform only the
inter-cloud negotiation part, without calling the connect_gateway operation (section
4.9.1). The cloud site running the ‘inter-cloud’ mode NRS needs to make use of the
information returned from the inter-cloud negotiation. This information is required to
perform the local configuration of creating an inter-cloud endpoint and connect the
local networks to it; the intra-cloud NRS functionality that can do that is missing. As
an aid to a site choosing ‘inter-cloud only mode’, NRS provides the optional
allocate_gateway and deallocate_gateway operations. These skip the normal
reservation and allocation steps done by connect_gateway (section 4.9.1), and
directly configure a gateway with the passed information. These optional operations
directly invoke the gateway’s device plugin to perform configuration, so that the site
using ‘inter-cloud only mode’ can choose to re-use the gateway plugin to perform
configuration exactly as performed in a full-fledged NRS service.
As seen in the inter-cloud information exchange and with ‘inter-cloud’ mode, there
are plenty of configuration options related to inter-cloud connections. These are
further discussed in Chapter 6. In addition, every NRS service needs to run an inter-
cloud server that listens for inter-cloud requests. More about implementation details
can be found in Chapter 5.
4.10 Administrator Access
The network topology on which NRS operates needs to be inserted in the NRS
somehow. Apart from being inserted, all parts of the topology need to be modifiable
at runtime by an administrator. Based on the topology model created for the NRS
prototype, the modifications that an administrator would do can be grouped in two
families of actions:
1) Modifications on static nodes of the topology: These are modifications such
as creating/deleting a node, adding a port to a node, or connecting and
disconnecting ports of different nodes.
2) Modifications on the L2 networks: An administrator may want to manually
change the port membership of an L2 network. A motivation for this can be
a need to re-arrange the implicit ports that belong to the network, so that the
network traffic no longer goes through a specific node and/or explicitly goes
through a desired node.
Such modifications may conflict with ongoing network operation. We can consider a
snapshot of the NRS topology while the system has been running for some time:
Typically, several L2 networks will have been formed on top of the topology, each of
them having a set of ports belonging to it, possibly overlapping with the ports of the
other L2 networks. If the administrator modifies one of these ports, either by
removing the port or disconnecting two nodes from each other, or if the administrator
adds new ports to a network that are not connected to its old ones, one or more
networks can be put in an inconsistent state (as defined in section 4.6.4). Therefore,
any modifications done must make sure that all networks remain consistent.
Thus, two things are required for administrative access: Firstly, a way to access and
modify the topology that is suitable for a human, and secondly, mechanisms that
make sure that the topology is kept consistent.
System architecture 71
4.10.1 Topology access
Initially, a structured textual representation (XML or equivalent) of the network
topology was considered. The administrator would be able to create it and insert it to
the NRS service, or retrieve it from NRS while it is running. Such a feature would
serve two purposes:
1) a human-readable and editable topology
2) a way for the system to save its state offline
This would also lead to two different representations of the topology model: 1) the
software model, as kept in the system’s memory, and 2) the schema that models the
topology in the textual format (the XML schema). The topology model however was
not set in stone for the duration of the project; for the most part of it, it was being
modified as part of exploring different modeling options and design decisions. Using
both the software model and an XML schema would lead to maintenance overhead
not acceptable for the NRS prototype development; not only would the two
implementations of the model need to be modified, but also the parser that would
convert from one to the other.
There is an additional complication in modifying the topology. When an
administrator needs to modify it while the system is running, the system needs to stop
serving requests, or at least requests related to some parts of the topology. For
instance, if the administrator is changing a specific L2 Network, the system cannot be
growing or shrinking it in the background; the network needs to remain unchanged
from the moment the administrator receives the topology in human-readable form,
until he or she submits a modified one. That means that NRS should stop serving
requests, at least the ones concerning the specific network. This introduces the
concept of the topology being a shared resource or containing multiple shared
resources, contested between the administrative access and the NRS service.
An XML schema is not well suited towards such functions, but more towards
‘offline’ viewing. Trying to use it in the scenario described above would require an
additional facility or tool that the administrator would use to notify which parts of the
topology he will change. This tool then would have to inform NRS on which parts of
the topology cannot be modified; the tool would have to be part of NRS itself or
perform inter-process communication to it. From a prototype’s perspective, it would
be much more convenient if the administrator can have direct access to a command
line tool that has access to the topology object as it is kept in memory, and can
present the administrator with options to modify it. If this tool is an extension of the
NRS service, access to shared resources is easily resolved.
This was the chosen mechanism for the prototype; NRS provides a CLI environment
that presents its user with the topology and options to modify it. The CLI tool also fits
well with the common practice of network devices and services to provide an
administrator with a CLI environment to make modifications.
Apart from the topology, the administrator is interested in changing the network
isolation manager at runtime as well. For example, he may need to mark a new set of
802.1Q VLAN ids as unusable. The network isolation manager object is provided to
the administrator through the same CLI tool.
4.10.2 Topology insertion
Apart from the administrator access tool, the topology may need to inserted or
modified by an automatic network discovery tool or facility. Such a tool was out of
scope for the NRS project, inserting the topology however should not be coupled to
the administrator access tool.
The object that operates on the topology is the request manager, and this is where the
topology modifications are applied. The request manager provides an operation to
replace its topology object with a new one. Before the new topology is accepted, all
modified L2 networks are checked for consistency. The consistency checks guarantee
System architecture 72
consistency of networks at a logical level of the topology (or as logical reservations).
However, the newly added or removed ports of the networks need to be allocated as
well, i.e. plugin calls need to be performed that will connect them to the L2 networks.
After the allocation step is complete, the new topology replaces the old one. The class
methods that perform these operations are shown in Figure 46.
4.10.3 Administrator CLI
Command line interfaces can be very elaborate; they provide their own custom
commands and functionality concepts, and issues such as usability and
comprehensiveness are important for a successful CLI. Creating an elaborate CLI tool
was not a goal of the NRS prototype. Its CLI tool is simply a means to achieve the
desired functionality of providing real-time access to the NRS topology.
The NRS system’s administrator CLI is based on the Python interactive interpreter;
an interactive prompt that accepts any Python code, prints expression results and
stores all variables that the user creates for the duration of the session. The NRS CLI
basic concept is that it provides the exact Python API that is used to create the
topology object internally. Thus, the administrator modifies the object in the same
way that the developer creates it programmatically. Using the Python standard
library, such a solution was very easy to create.
The CLI is based on the InteractiveConsole class, provided by the Python library’s
code module. This class directly provides the Python interpreter functionality; the
interpreter’s environment, i.e. the variables and functions that are available to the
user, can be modified before the interpreter is started. This allows the topology object
to be passed to the interpreter’s session, along with any other functions needed to
support the CLI’s functionality. The user can then call the available functions and
object method’s directly. Therefore, passing the topology object to the interpreter
makes all of its methods available to the interpreter, which include adding and
modifying nodes and ports.
The CLI uses the concept of committing modifications, inspired by Juniper’s Junos
Figure 46: Operations to modify the topology
System architecture 73
OS CLI47
. The user is free to make any modifications, but none of them are applied
until an explicit commit command is submitted. It is then that the modifications are
checked for consistency, and applied to the running system. This is why the topology
object passed to the CLI session is actually a copy of the topology object, and not the
original one used by the NRS service. The copy can be used by the CLI user to view
or modify, while the NRS service is simultaneously accepting requests on the real
topology object. If the administrator makes changes to the static topology, then the
service can keep accepting requests, since the user requests are not capable of making
changes to the static topology. If the administrator however needs to change the port
membership of an L2 network, then the service must stop until the administrator
commits his changes. This is signified by a special CLI command that needs to be
invoked, the edit_network. The handling of resource locking and the concurrent
tasks of the NRS system are discussed in Chapter 5.
The CLI is available to access through a Telnet server, a network protocol that
provides an interactive CLI facility. Telnet clients exist for almost every computing
platform. The server was created using the telnetsrvlib library, which is a third-
party Python library that implements a Telnet server. The library provides a
TelnetHandler class, which represents a single Telnet session that has methods to
handle received input. The class was extended to send the received input to the
interpreter class, receive the interpreter’s output and push it as a reply to the Telnet
client. The interpreter itself is implemented by extending the InteractiveConsole
class. The class provides a push_line function, which expects a string input,
evaluates it as a Python expression and then prints the outcome to the system’s
stdout. The class was extended to cache the output instead of printing it to stdout. The
cached output is used by the Telnet server as the reply. The class diagram of the
Telnet server and the interactive interpreter is shown in Figure 47.
Each Telnet session will instantiate an AdminShell, which deals with retrieving the
topology and network isolation manager, copying the objects and correctly
initializing the interpreter’s environment. It should be noted that the interpreter gives
the freedom to manipulate its objects in any possible way allowed by the Python
language. This can ruin the Telnet session, since someone can delete the topology
object for example. To alleviate this situation, a reset command is available that will
reload the interpreter’s environment with newly retrieved objects (although the reset
command can be ruined as well). Such improper usage will not affect the NRS
system, unless a commit with malformed arguments can pass the consistency checks.
In general, dealing with such behavior was out of scope for the project’s prototype.
Access to the CLI must to be strictly limited to the administrator. This is achieved by
configuration and deployment restrictions, explained in Chapter 6.
47 http://www.juniper.net/techpubs/en_US/junos11.1/information-products/pathway-
pages/junos-cli/junos-cli.html
System architecture 74
Figure 47: Administrator CLI over Telnet class diagram
System architecture 75
4.10.4 System state
Apart from letting the administrator view and modify the system’s topology, the
system needs a way to save its topology offline, so that it can boot or recover from it.
This does not pertain only to the topology, but to the overall system’s state. The NRS
system’s state consists of the state of its topology, its network isolation manager and
its reservation manager. These objects are saved by being serialized (using the Python
pickle module) and stored in files. The operation is shown in Figure 48. When the
system starts, it looks for the serialized objects to load, otherwise it starts with new
empty objects. An administrator also has the option to create and save these objects
‘offline’ using independent Python scripts.
4.11 Conclusion
This chapter presented the overall architecture of the system, the modules that
compose it, and the elements and interactions in each module. This is referred to as
the logical view of the system and conveys the system’s functionality and business
logic. Besides the logical view, a description of the system is needed that maps the
system’s elements to active tasks (threads) that share resources and communicate
with each other. In addition, the modules are grouped in components and code source
archives. These depictions of the system are described in Chapter 5.
Figure 48: NRS system state
Implementation 76
5 Implementation This chapter presents implementation details and describes the system’s components.
Quality attributes and source code analysis are provided as well. Moreover, this
chapter depicts the system’s concurrency and synchronization aspects.
5.1 Python
The programming language of choice was Python48
. Python is an interpreted
language that supports the imperative and object-oriented programming paradigms.
Python is dynamically typed and features automatic garbage-collection. Its use has
gained a lot of popularity during the last years, and it is being used extensively both
for scripting as well as for non-scripting purposes. Large projects have been using it
for production software, especially web applications (e.g., OpenStack, Google,
Dropbox, Reddit).
Python’s flexibility and dynamic nature makes it ideal for rapid prototyping. For the
needs of the NRS project, which glues together various different domains such as
network programming, scripts for controlling devices, and object-oriented design,
Python is a very good candidate. Its biggest asset is the standard library that comes
along with the language. ‘Batteries included’ is a proper depiction of the library’s
utilities. Various concepts of the prototype, such as the device plugins, the
administrator CLI, and the listening servers, were possible to implement in a short
period of time thanks to Python’s standard library.
5.2 System components
The system’s implementation consists of multiple libraries and extensions through
plugins. These belong to different components of the system, which are units of
replacement that compose the running system.
The system’s components are shown in Figure 49. The main component of the system
is NRS, which lies in the middle of the figure. It is the component whose
functionality was described extensively in Chapter 4. We can identify three Python
libraries that NRS is using directly:
networkX, the graph library
telnetsrvlib, the library that implements the Telnet server
Graphviz, the library that implements the graph visualization49
NRS also employs device plugins. At the allocation step, plugins are triggered to
perform device configuration. The plugins are required to implement the plugin
interface, but they are separate components can have any arbitrary form that suits the
specific device’s needs. For the needs of the NRS prototype, the implemented plugins
are scripted SSH sessions. They transfer and invoke the required CLI commands on
the devices using the pexpect Python library. This library provides the ability to
automate and control other programs, a functionality suited to automate SSH
sessions. The implemented plugins both for VM hosts and for switches use this
functionality.
48 http://www.python.org/ 49 Open-source graph visualization, http://www.graphviz.org/
Implementation 77
The NRS service is accessible through the TCPserver component. This component
listens for incoming connections and forwards the requests using the ServiceInterface
that is provided by NRS.
The client for the NRS service is implemented in the NRS CLI component. This tool
provides a command line interface for the user of the service, and provides various
command line arguments that correspond to the ServiceInterface operations. When
the tool is invoked, it translates its arguments into server requests and sends them to
the server. The communication with the main service is done through a custom
messaging protocol.
A few command line usage examples follow:
nrs connect --host NAME --interface NAME --network_id ID --vlan_id ID
nrs gateway disconnect --network_id ID --gateway NAME
nrs allocate --reservation_id ID
The CLI tool needs to be used by a cloud platform for NRS to be utilized by the
platform. In the case of OpenNebula, this is performed by the NRS OpenNebula
driver, which implements the OpenNebula virtual network driver functionality (as
described in section 2.4.1). The virtual network driver defines three actions that
perform network configuration related to one VM that changes states. The actions
are:
the pre-boot action, invoked before a VM is launched
the post-boot action, invoked after a VM finishes launching
the cleanup action, is invoked after a VM is shutdown or removed
These actions are implemented as scripts that receive all information related to the
VM as an argument. This includes the VM’s VIF and the id of the virtual network.
The NRS driver uses the post-boot and cleanup actions to call vif connect and vif
disconnect respectively, which result in the VM connected to the network as
Figure 49: NRS components
Implementation 78
described in the NRS functionality. The required arguments for these calls are
provided by OpenNebula. The NRS driver functionality is shown in Figure 50.
5.3 Code analysis
The NRS system implementation consists of source code artifacts. The project’s code
archive is similar in structure to the system’s modules as seen in the view of the
system’s architecture. A high-level view of the source code modules, as seen in the
project’s root directory, is shown in Figure 51.
The source code was analyzed using pylint50
, which is a tool for static analysis of
Python code. The Python community has specified a very strict coding standard51
,
and pylint can check Python code against the standard and grade the code based on
how well it adheres to it. This includes conventions such as variable and function
names, number of class methods, etc. The score related to the standard conventions is
the pylint convention score. Pylint can also check what may be programming
mistakes such as invoking objects that are not callable or assigning values from
functions that do not return a value. It can also check for indications for a need to
refactor code, mostly in the form of recognizing code duplication. These last two
50 http://www.logilab.org/857 51 PEP 8 -- Style Guide for Python Code, http://www.python.org/dev/peps/pep-0008/
Figure 50: NRS OpenNebula network driver
Figure 51: Overview of source packages
Implementation 79
properties are indicated by the pylint refactors/warnings score. These scores are
indications of the source code quality and in that regard are more important than the
convention score. The statistical data and results of the code analysis are shown in
Table 10.
Table 10: Source code analysis
Code Readability
2507 python statements 26% docstrings
17 packages 4% comments
63 *.py files pylint category pylint score
66 classes refactors/warnings 9.09/10
303 methods convention 7.1/10
The docstrings mentioned in Table 10 are comments that introduce a class or method
and explain its purpose and function (they are similar to Java’s javadoc). 30%
comments combined with a 7.1/10 standard adherence means that the source code is
quite readable by a third person. The refactors/warnings score of 9.09 has a bigger
significance; a low score would be an indication that something may be wrong in the
program.
Apart from readability and code duplication, an indication of the code’s quality is its
cyclomatic complexity. This is a software metric that measures the amount of
different paths in the control flow of a program. In general, a cyclomatic complexity
value below 10 is considered acceptable. Much higher than 10 indicates that the part
of the code with the high value may be possibly split into smaller modules or
functions to reduce its perceived complexity. The cyclomatic complexity of the
source code was measured with the pygenie52
tool, and the results are shown in Table
11. The tool measures the complexity of the system’s functions and class methods.
On the left side of the table is the complexity and on the right the amount of times it
occurs in the total of ~300 methods of the system.
Table 11: Cyclomatic complexity
Cyclomatic complexity Number of occurences
11 1
10 1
8 1
7 6
6 6
5 14
<= 4 rest (~270)
An aspect of code quality is its readability and complexity, which in turn determine
its maintainability. The metrics performed by the two tools mentioned in the previous
paragraphs show that the source code is in a good condition in this regard.
52 http://traceback.org/2008/03/31/measuring-cyclomatic-complexity-of-python-code/
Implementation 80
5.4 Concurrency
In the description of the system’s functionality, it was specified that some of the
system’s elements are concurrent tasks that share resources. In addition, the NRS
system has to expose its functionality to its users. This is done by providing a socket
server implementation that is listening for incoming requests; to handle them, the
server needs to forward requests to the NRS object in the form of the service
operation calls, and receive a reply. Multiple such requests may happen at the same
time. In particular, there are three high-level functionalities that have parallel
execution semantics.
1) The main service operations, which correspond to the service interface
operations presented in section 4.2. These are requests performed by the user
of the service and handling them is the main functionality of the system.
Most of the requests result in changes in the topology’s L2 networks.
Multiple users can make simultaneous requests to the service.
2) The administrator CLI tool. This tool is invoked by the administrator and
runs in parallel with the main service. The tool can perform changes to the
topology.
3) The inter-cloud service. This refers to the server listening to requests from
remote NRS systems. In a similar fashion with the main service, it may
receive multiple simultaneous requests. The communication through this
channel uses its own special conversational protocol as described in section
4.9.2.
The above points show that there is a need for at least three concurrent tasks that
access the NRS service. This means that the service can be heavily contested. All
possible simultaneous actions can end up modifying the system’s state: the topology,
the reservations and the network isolation information. The functionality of the
system however can guarantee its integrity only if changes are atomic. The class
relationships that show the resource congestion are shown in Figure 52.
There are various ways to solve this resource congestion. The simplest one is to
identify that all changes in the state of the system are initiated in the operations of one
object: the request manager’s reserve, allocate, release and replace_topology
operations. Making these operations atomic solves the resource sharing problem and
this is the solution chosen for the prototype.
The request manager’s operation atomicity is implemented by a Lock object, the
Python synchronization primitive.
Implementation 81
The atomicity of these operations makes all operations on the topology sequential.
This means that all requests wait for the previous one to be over; however the
prototype is not meant to be scalable in the context of serving multiple requests
simultaneously. Internally, it still has to process every request sequentially and
atomically, at least in the level of performing logical operations on the topology. The
ability to operate on the topology in parallel while ensuring consistency would
complicate the logic, was not relevant for the requirements, and would risk the
feasibility of the project.
Since these operations are atomic, there is no need to create elaborate server
implementations. A synchronous listening server that spawns no threads or processes
to handle requests is sufficient. Such a server processes all requests sequentially and
atomically, which is what the system that receives the requests also supports.
The functionality of the system is mapped to one process and three threads.
One process:
Thread 1: the ‘main’ thread running the server for the main service and the
inter-cloud server. These are multiplexed with the select() system call.
Thread 2: running the Telnet server for the administrator CLI
Thread 3: periodically saving the system’s state
All the threads are created when the system is initialized, and remain running until
the system is shut down. Their initialization is shown in Figure 53.
Figure 52: Sharing of request manager
Implementation 82
Figure 53: NRS initialization
Deployment 83
6 Deployment This chapter describes how the NRS prototype can be deployed within a cloud site,
and with which machines and devices it needs to communicate and how. It also
describes configuration options.
6.1 NRS cloud deployment
A cloud site consists of virtual machine hosts, switches, and some machines that run
the cloud software platform. NRS needs to communicate with the cloud platform to
receive requests, and with the hosts and switches to configure them. In addition, NRS
needs to communicate with remote NRS systems, as well as be accessible to an
administrator. This immediately places the NRS system in need to be connected to
various different networks:
1) the “management network”, which has connectivity to all hosts and switches
of the cloud site for control and monitoring purposes
2) a “public network”, which is an external network over which communication
with a remote NRS system is possible
3) a network where an administrator has easy access to
The services that NRS offers are related to these networks as well. NRS has up to
three configurable network addresses for each of its servers:
1) the “main service” address, through which NRS receives requests. This may
or may not be in the management network, but should be reachable from the
machine running the cloud platform
2) the “inter-cloud service” address, through which NRS communicates to
remove NRS systems to set up inter-cloud connections. This address is
facing the public network NRS is connected to.
3) the telnet service address for administrator access. This is configured to the
leisure of the administrator. It is important that access to it is not possible
outside the administrative network.
A deployment of NRS in a cloud site is shown in Figure 54. The figure shows
different machines with network interfaces attached to them. It is assumed that NRS
is running on a dedicated machine with a separate network interface for each network
it is connected to, although these are not necessary. The different networks within a
cloud site are shown with different colors in Figure 54. The production network,
which hosts the users’ networks traffic, is a separate domain from the management
network.
The NRS system options such as the server network addresses are configured in the
NRS configuration text file. A sample is shown in Figure 55, where you can see basic
configuration options and the system’s initialization. It can be seen that the three
system’s functionalities are optional and can be selectively turned on or off. The
functionalities are
1) intra-cloud, which includes all internal reservation logic and device
configuration
2) inter-cloud, which includes inter-cloud negotiation with a remote site and
configuration of inter-cloud gateways
3) telnet, which is the Telnet server
Each of the system’s functionalities has several related configuration options that
need to be filled in if the functionality has been enabled. All functionalities being
enabled corresponds to a deployment with functionality similar to Figure 54.
Deployment 84
6.2 Device plugins
Besides the NRS system, the device plugins may have their own deployment and
configuration aspects. These are specific to each device plugin. The plugins
developed for the prototype are SSH sessions. Therefore, the configuration required
was that the devices need to be pre-configured to allow SSH connections with
administrative rights.
There were four device plugins implemented and deployed for the NRS prototype,
two for physical switches and two for virtual switches. These configure the following
devices: the 3com 4210 switch, the brocade fastIron switch series, hosts with Linux
bridge, and hosts with Open vSwitch.
The switch plugins translate the plugin commands into CLI commands that are
specific for the switch’s CLI, and send them to the device over SSH. The calls
Figure 54: NRS deployment in a cloud site
Figure 55: NRS basic configuration
Deployment 85
manipulate trunk ports. For example, a plugin call such as
allow(host=”3com4210”, port=”1/0/2”, vlan_id=5)
is converted to the following CLI commands, as seen in the 3com’s telnet interface
(including the prompt):
<4210> system-view
[4210] interface Ethernet 1/0/2
[4210-Ethernet1/0/2] port link-type trunk
[4210-Ethernet1/0/2] port trunk permit vlan 5
The host plugins operate in a similar fashion to send commands that manipulate the
virtual switch. The end-result is similar to how cloud platforms use them, as
described in section 2.4.1 and shown in Figure 10 and Figure 11. The difference is
that the VM gateway is not trunked to allow all possible VLANs, but only the ones
that dynamically have been configured by NRS. The Open vSwitch deployment in a
host that needs VMs connected to two networks with VLAN ids 6 and 15 is shown in
Figure 56.
6.3 Inter-cloud with OpenVPN
The implemented inter-cloud connection that was chosen for the NRS prototype uses
OpenVPN. The gateway for the OpenVPN service is a machine with the OpenVPN
software installed that can launch OpenVPN processes. This VPN gateway
(informally, the VPN box) needs to be connected to the external network over which
VPN connections will be created.
The two NRS systems also need to be able to talk to each other over an external
network, so that the inter-cloud negotiation can take place. For it to finish
successfully, both services need to agree that they provide the same inter-cloud
capabilities, as determined in their configurations. After the negotiation is complete,
each service needs to configure the VPN gateway. This is performed with calls to the
device plugin that is associated with the gateway. The VPN configuration is
exchanged during the inter-cloud negotiation.
The deployment of the OpenVPN inter-cloud connection is shown in Figure 57. The
plugin’s functionality for a single inter-cloud connection consists of the following
sequence:
1) launch an OpenVPN process with L2 networking, which creates a tap
network interface in the VPN gateway. The tap interface is the inter-cloud
Figure 56: NRS Open vSwitch plugin operation
Deployment 86
connection end-point for each side.
2) bridge a local L2 Network to the tap network interface, which allows the L2
traffic to flow through, after which the inter-cloud connection is live and the
two networks are bridged.
Every L2 Network is associated with a bridge in the VPN gateway, so that it can be
connected to multiple tap interfaces which correspond to multiple remote L2
networks. In addition, the VPN gateway can host multiple local L2 networks and
multiple OpenVPN processes.
The overall deployment for each cloud site is left to the local administrator. The NRS
system of site #1 only knows that the traffic coming through each VPN connection is
part of the private network. On the side of site #2, it is up to the local administrator to
expose the desired private network to the VPN box (either with the use of intra-cloud
NRS or with a method of his choice). The manner that this is done is of no concern to
site #1; the only thing required is that the private network in site #2 is bridged with
the tap interfaces of VPN gateway #2.
Inter-cloud configuration
The inter-cloud functionality has additional configuration options, show in Figure 58.
These include connection-specific details and limitations on which networks can be
used for inter-cloud. The administrator can specify which local L2 networks are
allowed to take part in inter-cloud connections. In addition, SSL certificate and key
files can be specified to be used in the connection to remote NRS systems in order to
secure the communcation.
Figure 57: NRS inter-cloud with OpenVPN
Deployment 87
6.4 Deployment at the Nikhef cluster
The NRS system was deployed in a small private cloud site at Nikhef’s data-center.
The machinery that comprised the cloud site was provided by Nikhef for the needs of
the project. The cloud platform of choice was OpenNebula and its installation and
deployment was performed in the first months of the project. The small cloud site
was used to continuously integrate the software product with the actual cloud
deployment.
The actual deployment is shown in Figure 59; the ‘local’ cloud site consisted of three
virtual machine hosts, three switches and one machine acting both as the OpenNebula
frontend and as the host machine of the NRS service. In addition, a static virtual
machine was used as the OpenVPN gateway for the OpenVPN inter-cloud
implementation.
The ‘remote’ cloud site for the inter-cloud was deployed as a single host that fulfilled
three functionalities at the same time:
1) Run NRS in ‘inter-cloud only mode’
2) Run the KVM hypervisor
3) Was the OpenVPN gateway
This was sufficient to deploy the inter-cloud connection of VMs over OpenVPN, as
shown in the inter-cloud deployment of Figure 57.
If we compare the deployment of the system with the one described in section 6.1
(Figure 54), we can see that the prototype deployment has various machines and
networks overlapping. For example, there is one machine hosting both NRS and the
cloud platform, and the ‘external’ network is actually part of the same cluster. This
deployment is fine with regard to testing and verifying the prototype. In general, the
system’s deployment can be configured extensively; Figure 54 shows the most
elaborate deployment allowed by the system.
Figure 58: NRS inter-cloud configuration options
Deployment 88
Part of the topology of Figure 59 corresponds to the logical topology maintained and
controlled by the full-fledged NRS system that resides in the ‘OpenNebula frontend’
host. A snapshot of the graph of that topology is shown in Figure 60. This graph is
exactly how the topology is kept in the NRS system and visualized with the Graphviz
tool. Apart from the static parts, we can identify that the topology of Figure 60 also
contains parts that have been dynamically added to it at runtime: three virtual
machines that have been launched by OpenNebula, and a logical L2 network formed
between them.
The hosts, switches, and the OpenVPN gateway use the device plugins that have been
described in sections 6.2 and 6.3.
Figure 59: Private cloud deployment at Nikhef
Deployment 89
Figure 60:Topology snapshot of the deployed NRS system
Verification and Validation 90
7 Verification and Validation This chapter describes to which extent the system requirements are fulfilled by the
NRS system’s functionality. It also presents the testing procedures used to determine
the correctness of the produced prototype.
7.1 Functional Validation
The validation of the system’s functionality, i.e., determining if the functionality
meets the system requirements, was performed with acceptance tests. These tests
dictate certain user input and determine how the system functions compared to how it
is expected to function. All the tests were performed in the local deployment of the
system at the Nikhef cluster, as described in section 6.4. The tests are applied to a
high-level view of the system; it is the point of view of the user and of the
functionality he expects the system to exhibit. The tests presented in this section are
not exhaustive, but an indication of how the system’s functionality is verified.
The acceptance tests were performed manually; automating such tests is beneficial
but due to the nature of the system (integrating lots of different components, from
cloud platforms to network switches) automating the tests for the NRS system would
have required a lot of development effort.
FR1: Network connectivity
The first requirement states that the NRS system should provide network connectivity
among a set of network interfaces. Network connectivity refers to the network
interfaces able to reach each other over the network. This can be easily confirmed if
the network interfaces are given IP addresses that belong to the same IP subnet. Then,
the ping tool can be used by the machines that own the network interfaces to ping
each other’s IP address. The test is shown in Table 12.
Table 12: Network connectivity tests
Summary
Verify that network interfaces ‘connected’ with the NRS connect operation can reach
each other, i.e. are actually connected to each other.
Preconditions
1. At least two virtual machines should have been launched in the same host,
and at least two virtual machines should be in different hosts that are
connected through one or more switches.
2. The virtual machines’ interfaces must have IP addresses in the same subnet
3. NRS must be running and the NRS topology must contain the devices that
will be used for the test.
Steps Expected results
1. Pick an unused L2 network id and call
vif connect with all the VM’s
interfaces and the network id as
arguments
2. Log in each virtual machine and ping
all others
1. The pings should return ping
responses.
Verification and Validation 91
FR2: Isolation of L2 networks
This requirement refers to the fact that L2 networks formed by NRS must be isolated
from each other and from any other networks that may exist in the network
infrastructure. This can be verified with the use of tcpdump53
, a tool that can monitor
a network interface for IP packets that arrive to the interface. The tool can be used to
show that packets that belong to a L2 network never reach a network interface that
belongs to a different L2 network, and vice versa. This can be verified with
monitoring of packets that belong to protocols that send broadcast traffic, such as
ARP. A broadcast will always arrive to all reachable destinations, i.e., if a broadcast
packet does not reach a network interface, the broadcast packet and the interface
belong to separate L2 networks.
The implementation of network isolation depends on the isolation method chosen and
the devices that implement it. In the case of 802.1Q VLANs, isolation is enforced by
physical and virtual switches, and NRS uses these as specified in their usage manuals.
Essentially, testing whether the networks are isolated tests
1) whether the physical and virtual switches correctly implement VLAN
separation and
2) whether the NRS system integrates them correctly
The acceptance test of FR2 is shown in Table 13.
Table 13: Network isolation test
Summary
Verify that traffic from different L2 networks is separated from each other.
Preconditions
1. NRS must be running and the NRS topology must contain at least two
different L2 networks (networks A and B).
2. The devices that belong to the L2 networks need the tcpdump utility.
Steps Expected results
1. Start tcpdump on all network
interfaces that belong to the L2
networks and listen for ARP packets
(i.e., tcpdump -i eth0 arp)
2. Perform a ping among machines that
belong to network A.
3. Observe the tcpdump output.
4. Stop the ping.
5. Perform a ping among machines in a
network external to the L2 networks
formed by NRS, i.e. ping among IP
addresses that belong to the existing
management network between the
machines.
6. Observe the tcpdump output.
1. First tcpdump output:
i. All machines within network A
should have received an ARP
packet looking for the IP address
of the ping recipient (Who has xxx.xxx.xxx.xxx? Tell
xxx.xxx.xxx.xxx)
ii. All machines within L2 network
B should have received nothing.
2. Second tcpdump output:
Both machines within networks A and
B should have received nothing.
53 http://www.tcpdump.org
Verification and Validation 92
FR3: VLAN id restrictions on devices
This requirement refers to the system’s ability to take into account VLAN id
limitations in network devices, so that it can correctly map connection requests to the
logical topology based on whether the devices can support it. If an L2 network is
associated with a VLAN id that a device does not support, the device should not be
possible to add to the L2 network. The acceptance test is shown in Table 14
Table 14: VLAN device restrictions test
Summary
Verify that device VLAN limitations are taken into account by the system when
connecting network interfaces to L2 networks.
Preconditions
1. NRS must be running with 802.1Q as the network isolation method.
2. The NRS topology should contain at least three network nodes that are
connected in a line.
3. The ‘middle’ network node should allow only one VLAN id (e.g., id=65)
4. NRS telnet service must be enabled
Steps Expected results
1. Login to the NRS admin shell through
telnet
2. Connect a port of a network node
without VLAN limitations to a new
L2 network. Force the L2 network’s
association with a VLAN id different
than 65.
3. Connect a port of the network node
with the VLAN limitation to the L2
network created in the previous step.
4. Connect a port of the third network
node to the L2 network of step 2.
5. Connect a port of the network node
with the VLAN limitation to a new
L2 network.
6. View the NRS VLAN status.
1. The first ‘connect’ operation should
finish successfully.
2. The second ‘connect’ operation
should fail with the error: ‘Cannot
accommodate chosen VLAN id’.
3. The third ‘connect’ operation should
fail with the error: ‘Cannot find path
to L2 network’
4. The new L2 network should be
associated with VLAN id 65
FR4: Administrator access to topology
This requirement refers to the ability of an administrator to insert or modify the
network topology, without restarting the NRS service or affecting existing
connections. This is possible to do with the use of the telnet service that allows direct
access to the topology. There are two administrator actions that involve modifying
the topology: one is related to static parts of the topology and modifies nodes and
ports, and the other modifies the L2 networks of the topology. The tests are shown in
Table 15 and Table 16.
Verification and Validation 93
Table 15: Administrator static topology modification test
Summary
Verify that an administrator can modify the static topology while the NRS service is
running.
Preconditions
1. NRS must be running and the NRS topology must contain
i. at least one L2 network that spans over 3 network nodes
ii. at least one network node that is not part of the L2 network
2. NRS telnet service must be enabled
Steps Expected results
1. Login to the NRS admin shell through
telnet
2. View the L2 network’s member ports.
3. Select a node that belongs to the L2
network, and remove it from the
topology.
4. Commit the change.
5. Select the node that is not used in the
L2 network, and remove it from the
topology.
6. Commit the change
7. Compare the L2 network member
ports with the ones before the
commit.
8. Perform the ‘network connectivity’
test on the machines that belong to the
L2 network
1. The first commit should return an
error: ‘Incosistent L2 network’
2. The second commit should complete
successfully.
3. The L2 network ports should remain
identical.
4. The pings from the ‘network
connectivity’ test should return ping
responses.
Table 16: Administrator L2 network modification test
Summary
Verify that an administrator can modify an existing L2 network in the topology.
Preconditions
1. NRS must be running and the NRS topology must contain
i. at least one L2 network that spans over 3 network nodes in a line, so
that one of the nodes is the ‘middle’ node in the L2 network graph
ii. at least one network node that can replace the existing ‘middle’ node in
the network graph (and become the ‘new middle’ node)
2. NRS telnet service must be enabled
Verification and Validation 94
Steps Expected results
1. Login to the NRS admin shell through
telnet
2. Remove the ports of the L2 network
that belong to the ‘middle’ node.
3. Commit the change.
4. Identify which ports that belong to the
‘new middle’ node will make the L2
network consistent, and add them to
the L2 network.
5. Commit the change
6. View the new L2 network member
ports.
7. Perform the ‘network connectivity’
test on the machines that belong to the
L2 network
1. The first commit should return an
error: ‘Incosistent L2 network’
2. The second commit should complete
successfully.
3. The L2 network ports should now
contain ports from the ‘new middle’
node.
4. The pings from the ‘network
connectivity’ test should return ping
responses.
FR5: Administrator access to network isolation
This requirement refers to the ability of an administrator to configure the network
isolation mechanism used, and more specifically to restrict the usage of a set of
isolation identifiers. These restrictions are best understood when the network
isolation is 802.1Q VLANs and the administrator wants to deter the system from
using a set of VLAN identifiers of his/her choice. The acceptance test is shown in
Table 17.
Table 17: VLAN administrator restrictions test
Summary
Verify that the administrator can deter the system from using a set of VLAN ids of
his/her choice.
Preconditions
1. NRS must be running with 802.1Q as the network isolation method.
2. NRS telnet service must be enabled
Steps Expected results
1. Login to the NRS admin shell through
telnet
2. Restrict available VLAN ids to only
one (e.g. id=76)
3. Commit the change.
4. Connect a network interface to a new
L2 network.
5. View the VLAN status.
6. Connect a network interface to
another new L2 network.
1. The VLAN status should show that
the new L2 network is associated with
VLAN id 76.
2. The second ‘connect’ operation
should fail with the error: ‘No
available VLAN id found’.
FR6: Internet connectivity
This requirement refers to the ability of the NRS system to provide external network
Verification and Validation 95
connectivity to the L2 networks. One such network can be the Internet. This
functionality is supported by the system with the use of gateways, special network
nodes whose device plugins can configure them so that they can connect an external
network to a local L2 network (section 4.9). An Internet gateway was not
implemented for the prototype; instead the gateway concept is used by the OpenVPN
gateway that serves inter-cloud connections, discussed in the following section.
FR7: Inter-cloud connections
This requirement refers to the ability of the NRS system to negotiate and create inter-
cloud connections that connect L2 networks located at different cloud sites. As
described in section 4.9, this is supported by the system with a negotiation that takes
place between remote NRS systems, and the configuration of the inter-cloud
gateways in each site.
The inter-cloud connection was implemented with OpenVPN. The network
connectivity provided by the inter-cloud connection can be verified in a similar
fashion to verifying FR1. The acceptance test for OpenVPN inter-cloud is shown in
Table 18.
Table 18: OpenVPN inter-cloud test
Summary
Verify that once two L2 networks are connected over an inter-cloud, network
interfaces that belong to them can reach each other.
Preconditions
1. At least two NRS services with inter-cloud enabled need to be running and
operate on ‘remote’ network topologies. The NRS services should be able to
reach each other over a network.
2. Each NRS service should control a functional OpenVPN gateway.
3. Each topology should contain at least one L2 network, which should be
permitted to be connected to an inter-cloud connection in the NRS
configuration. Each L2 network should contain at least one network
interface.
Steps Expected results
1. Pick one of the NRS services and ask
to connect the two L2 networks over
an OpenVPN inter-cloud. (nrs inter-cloud start -type
OpenVPN ...)
2. Perform the ‘network connectivity’
test on at least two network interfaces,
each belonging to a different L2
network (i.e. each belonging to a
different cloud site)
1. The pings should return ping
responses.
FR8: Inter-cloud mode
This requirement refers to the system being able to operate in an ‘inter-cloud only
mode’, so that it offers the inter-cloud functionality to a cloud site, but none of the
intra-cloud functionality (section 4.9). This mode can be specified in the NRS
system’s configuration file. In this mode, the system does not maintain any internal
topology but can perform the inter-cloud negotiation, and can optionally configure a
stand-alone gateway. The test for this case is similar to FR7, but with one service’s
intra-cloud functionality disabled. In addition, the ‘inter-cloud only mode’ system
Verification and Validation 96
should accept only inter-cloud CLI requests.
FR9: Advanced network features
This requirement refers to the system’s ability to provide network features beyond
simple connectivity, such as bandwidth guarantees, QoS and ACLs. This feature was
not implemented in the NRS prototype, as it is a feature that introduces complexities
in all the features of the system and could not have been implemented in time.
However the system’s architecture was built with this feature in mind so that it can be
provided as a future extension (see future work discussion in Chapter 8).
7.2 Non-functional validation
Apart from functional expectations, the system is required to have certain qualities
that are related to how the system operates. These are described by the non-functional
requirements of the system. Their fulfillment is determined by the system’s
architecture which is described in Chapter 4. The extent to which the non-functional
requirements are fulfilled is described in this section.
NFR1: Arbitrary network topologies
This requirement refers to the fact that NRS should support operating on any arbitrary
topology, i.e. not be limited to a specific type or form of topology. This is achieved
through the use of graphs; they allow the specification of any arbitrary topology as
long as it contains switches and hosts, with network links between them. The graph
algorithms operate on any graph topology as well, therefore connections are possible
to achieve on top of arbitrary topologies.
NFR2: Support for configuration of any network device
This requirement refers to the ability to configure any arbitrary switch or network
device in general. This is achieved with the use of device plugins that have the
knowledge of performing specific device configuration, as explained in section 4.7.
The creation of the device plugins is left to the specific cloud site maintainer, but they
allow integration of any device with the system. A few proof-of-concept plugins were
implemented, as described in section 6.2.
NFR3: Replaceable network isolation mechanisms
This requirement refers to the system’s support for replacing the existing network
isolation method with a different one. The chosen isolation is 802.1Q VLANs, but the
system should be extensible with new isolation mechanisms. The system supports
replacing 802.1Q VLANs, as long as the new isolation mechanism fits with the
concepts of 802.1Q. That means that, the new mechanism may use some kind of
identifier to identify different isolated networks, and that the network separation is
enforced at the level of network devices through proper plugin calls.
To extend the system with a new such mechanism, a new module needs to be created
that will handle the isolation identifiers that will implement the Network Isolation
Manager interface (section 4.5), and new plugin commands need to be implemented
that will enforce this isolation mechanism. These concepts are sufficient to introduce
VXLANs, Q-in-Q, and MAC-over-IP tunneling to the NRS system. However, a
proof-of-concept implementation of a mechanism other than 802.1Q was not
implemented for the system.
NFR4: Interface to cloud platforms
This requirement refers to NRS being accessible by any cloud platform without being
limited or tightly coupled to only one platform. An OpenNebula plugin was created
for the needs of the prototype, but the NRS interface is not tailored specifically to
OpenNebula. In the case of intra-cloud functionality, NRS requires to be provided
Verification and Validation 97
with the minimal information in order to be able to provide network connectivity,
which consists of the network interface names and the hosts that they belong to. As
discussed in section 2.4, both OpenNebula and OpenStack provide this information in
the interfaces they provide for network virtualization services, namely OpenNebula’s
virtual network manager and OpenStack Quantum. Since these two cloud platforms
are the only ones providing clean interfaces for this purpose, it is likely that other
cloud platforms will follow suit. Apart from cloud platforms, the NRS interface can
be used in a human-friendly way as well with the provided NRS CLI tool.
NFR5: Support for various inter-cloud connection types
This requirement refers to the system’s ability to be extended to support different
inter-cloud connection types. The plugin system of NRS applies to inter-cloud
gateways as well; a plugin configures the gateway. Therefore, extending NRS with
new connection types is reduced to creating the proper plugins. The NRS inter-cloud
negotiation supports exchanging configuration details and passing them to the inter-
cloud gateway plugin. This allows the exchange of connection-specific configuration
details for any connection type.
NFR6: Performance
This requirement refers to the amount of time the NRS system requires to fulfill a
connectivity request. Requesting a virtual machine from a cloud service can take
from a few seconds to a few minutes, depending on configuration choices. The
overhead of network configuration provided by the NRS system should not increase
that time by a substantial amount. Therefore, an upper limit of 10 seconds to service a
request is considered sufficient for the needs of a generic cloud user.
NRS deals with requests in two distinct steps: the reservation and allocation steps
(section 4.8). The reservation step’s execution time depends on the performance of
the algorithm used to operate on the topology. The logic decision of how to map a
connection request can be very fast if the algorithm used is simple. On the other hand,
coming up with complicated resource requests and expecting a “good” decision may
take some time (depending on the algorithm used, the complexity of the topology,
and especially if algorithms try to map bandwidth). Different classes of users may
have different needs, trading a fast decision for a better decision or vice versa.
Different algorithms can be used for different usage requirements, and this influences
the balance of response time to decision quality. In any case, introducing new
algorithms in the system is supported by its architecture. The simple path-finding
algorithm implemented for the NRS prototype (uses Dijkstra shortest path which has
) complexity) is among the fastest path-finding algorithms, and not the
bottleneck in the NRS performance; the time-consuming step is the allocation.
The performance of the allocation step, which actually deploys the connection (i.e.,
configures switches), depends on the device plugin implementation. More
specifically, the amount of time required for allocation is limited by the least possible
time to configure a device with a chosen communication method, e.g., it takes a few
seconds to complete an automated SSH session to a switch. The allocation step
performance can be improved in two ways:
1) optimize the invocation of device plugins per device, i.e. launch automated
sessions only once per device (to minimize session overhead such as
initialization, authentication), and launch sessions for different devices in
parallel
2) optimize the performance of each separate device plugin. If a chosen
communication method is too slow, it is perhaps advisable to implement one
with less communication overhead (e.g., replace SSH sessions with SNMP
requests)
The NRS prototype is primarily a proof-of-concept prototype and therefore not
optimized for performance. The SSH device plugins are quite heavy-weight when it
comes to execution time. The prototype implementation still takes less than 10
Verification and Validation 98
seconds for requests that involve less than five network devices. With further
optimizations, the system can become much faster.
7.3 Verification
Apart from fulfilling requirements, the system also needs to be checked for
correctness, i.e. whether the system is built according to the design specifications and
whether it contains implementation faults.
The implementation language of choice, Python, uses automatic garbage collection
and does not leave room for common memory management errors associated with
languages such as C and C++. In addition, the static analysis of the source with with
the pylint tool gave good scores when it comes to possible programming errors
(shown in section 5.4).
A testing suite was developed along with the rest of the system. The suite contains
simple unit tests that test each class and method against their functional
specifications. In addition, the suite contains integration tests between the different
modules. The suite’s focus is mostly towards the software logic related to the
topology, the algorithms, the consistency checks and the reserve/release logical
operations. These are the core of the system’s logic, and their correctness is much
more important than peripheral components of the system, such as device plugins.
The suite is focused much less on testing the latter, which have a more pronounced
‘proof-of-concept’ nature. The test coverage was measured with the coverage54
tool
and was found to be at 72%. This is considered a fairly good result when it comes to
production systems, much more for prototype systems.
54 http://pypi.python.org/pypi/coverage
Conclusions 99
8 Conclusions This chapter presents the outcome of the project for the NRS system and compares it
against the initial project goals. In addition, it puts the NRS system in perspective
with regard to existing cloud networking solutions, as well as with regard to the Grid
software ecosystem. Moreover, it discusses recommendations for further
development of the system.
8.1 Functional Results
The system’s functionality envisioned at the beginning of the project was quite broad
and extended over different domains: cloud platforms, network discovery and
configuration, modeling of network topologies, system administration and
monitoring. A few main functionalities were identified as the pylons of the system,
but a wealth of possible extensions of its functionality was considered. The NRS
system is most correctly viewed as an architectural and functional basis that provides
the main functionalities and is built to be easily extensible to support more.
The project culminated in the design of the NRS system and the creation of its
software prototype. The system combines network topology knowledge and device
configuration ability to essentially create a platform for Network Virtualization. This
platform is the machinery that can dynamically apply network configurations both in
hardware devices and in virtual switches; these configurations are inferred by
requests from an external service or user. In the basic use-case supported by the
prototype, these are requests to create Virtual Networks. The external service
provides the information on what to provision; NRS knows how to provision it. The
services that benefit the most from the functionality provided by NRS are cloud
platforms; NRS is the glue between cloud platforms and network hardware.
8.1.1 A VM-aware cloud networking solution
NRS can be characterized in different ways based on the point of view of its users.
From the perspective of a cloud platform, NRS is a VM-aware networking solution to
creating Virtual Networks. This term refers to the ability of the system to provision
network resources dynamically as required by the network, i.e. as required by the
VMs that need to connect to each other. A VM-aware solution lies in contrast to static
configurations; it makes optimal usage of the limited resources (in this case 802.1Q
VLAN ids), which in turn greatly increases the VLAN scalability potential. RFC
555655
provides an estimation on Ethernet scalability limitations. Based on that
estimation, the NRS system can theoretically increase the scalability potential from
1.000 hosts, which corresponds to the single large broadcast domain in the case of
static VLAN configurations, to 100.000 hosts in 1.000 different VLANs. This is not a
precise number of course, and VM-aware solutions work best when VMs are spread
over a large number of hosts and switches, so that the VLAN distribution is not
overlapping. It is however a strong indication on the system’s scalability benefits.
Moreover, NRS uses 802.1Q VLANs which guarantee Virtual Network isolation and
privacy due to the nature of the protocol. Maximum performance based on the IP
stack is also achieved; 802.1Q uses regular IP packets inside L2 frames for network
traffic.
NRS, or an equivalent VM-aware solution, can utilize 802.1Q VLANs to the
maximum of their potential. However, there are different Network Virtualization
solutions that do not rely on this networking standard to provide Virtual Networks
(these have been described in sections 2.3.3 and 2.7). These solutions are mostly
based on L2 over IP tunneling to implement Virtual Networks, and can theoretically
55 http://tools.ietf.org/html/rfc5556#section-2.6
Conclusions 100
scale to much higher numbers that NRS (with 802.1Q) can provide. However, this
comes with a cost of performance, since the encapsulation of packets required by
tunneling requires significant computing power, especially in high-speed networks.
On the other hand, this can be overcome by more advanced (and expensive)
hardware. In general, the correct choice of the network virtualization depends heavily
on the specific network hardware deployment of each cloud site.
Apart from the functionality that NRS offers, the accessibility of its service is
important as well. NRS’s API concepts were based on the cloud interfaces that cloud
platforms provide specifically for such network services: OpenNebula’s virtual
network manager and OpenStack Quantum. This makes the system usable by the two
major cloud stacks that offer specialized networking interfaces. If cloud networking
becomes a promiment issue, it is very likely that such interfaces will become de-facto
standards and will be adopted by other cloud platforms.
8.1.2 A network resource management solution
From the point of view of a network administrator or a maintainer of cloud
infrastructure, NRS is promptly described as a Network Resource Management
solution. Its ability to maintain a representation of the network topology and
dynamically configure devices based on the limitations of each device (VLAN
limitations for example) takes away the burden of manual configuration. NRS will
work with any network hardware, as long as the right plugin is created for it. That
makes it a great choice for sites that use hardware considered ‘legacy’. Its usage of
802.1Q, a universal network standard, further assures that all hardware is supported.
On the other hand, non-legacy devices usually do not pose VLAN limitations in their
capabilities.
In addition, NRS allows a potential cloud provider to install a cloud platform and
utilize part of its infrastructure for the cloud service with minimal re-arrangement and
re-organization of the infrastructure. NRS can make sure that only the proper part of
the infrastructures’s network is utilized for the cloud service, leaving the rest
untouched, effectively providing an easy way of partitioning the infrastructure
network. Such a thing is not possible to achieve without the usage of a system similar
to NRS. It should be noted however that not all infrastructure providers choose to mix
infrastructure that serves different purposes.
8.1.3 Beyond cloud platforms
The NRS system’s usefulness is not limited to cloud platforms. It can also play a part
in the general Grid software ecosystem for infrastructure, as the tool that can apply
network configurations that match with user credentials: Grid users are identified by
X.509 “Grid” certificates, which are mapped to hardware privileges and/or
configurations. They can also be mapped to network characteristics such as a network
or VLAN id, and internal or external connectivity which should be attached to the
user’s VM(s). The policy engine which determines what can be provided to the user,
needs a tool such as NRS to enforce the provisioning, and in similar fashion a tool
such as a cloud platform to provide the actual VMs. This interaction is portrayed in
Figure 61. It should be noted that it is an envisioned interaction and not a reality as of
the writing of this report, as the EES56
policy engine system is under development.
56 http://wiki.nikhef.nl/grid/EES
Conclusions 101
8.2 Design criteria
There were several factors that influenced the design criteria for the creation of the
system. The development of the idea for the NRS system’s functionality was not a
significant aspect of the NRS project; the idea had already been developed at Nikhef,
and it had a fairly concrete form from the beginning of the project. However, the
development of the idea into a software system design that satisfies the functionality
was a major part of the project. The NRS system did not exist as a design or in any
software form when the project started; it was to be built from the ground up, and
converting the functional idea into the system’s design and architectural choices was
almost exclusively made during the project. The NRS as a system is ambitious and
large in scope; that means that the NRS project’s outcome would most likely result in
a basic design with the purpose of being extensible in the future. Therefore, laying
solid architectural foundations for the system was critical. For the project’s outcome
to be considered valuable, the newly developed system would have to be designed in
such a way that it satisfies all its envisioned future functionalities, as well as their
corresponding variabilities.
The major and recurring design factor was the system’s genericity. The system had to
interface with any network device to perform configuration without being limited by
the possible devices’ differences. It also had to expose its service in a manner that it
would be usable by any cloud platform or other service and/or user. In addition, the
system had to be able to easily substitute its network isolation mechanism with any
different one that could become prominent in the future. Lastly, the system involved
operations on the logical topology related to finding network routes (paths); the
algorithms that dictated the outcome of such operations had to be replaceable by ones
with different behavior to support more advanced functionalities. All these refer to
artifacts that could vary extensively. These variabilities were satisfied with extensive
use of interfaces, modules and object-oriented design patterns with the purpose of
abstracting and de-coupling system functionalities. These can be seen as the
DevicePlugin interface, the NetworkIsolationManager interface, the multiple design
pattern references and others details described extensively in Chapter 4.
The invetiveness of the system was also an important aspect. Creating a working
system would entail combining and integrating very different systems. It would
involve setting up a private cloud installation, exploring various network hardware
models, and create a system that communicates with cloud platforms and remote
versions of itself, operates on topology graphs and programmatically configures
Figure 61: NRS in the Grid ecosystem
Conclusions 102
network hardware. In addition, the system would be deployed in a data-center
infrastructure, taking into account intricacies and different configuration conditions
that arise in such environments due to their special security and reliability demands.
Although the realizability of such a system was not in question, successfully putting
everything together to create a functional and working system was important.
Lastly, documentation of the system was deemed important as well. This was for two
reasons: Firstly, the system’s design was brand new and the system was to be
extended in the future after the project’s end. Therefore, the design options had to be
extensively documented together with reasoning on the various design choices made
and what they accomplish. Developer documentation on the source code artifacts was
important as well. This would help with the continuation of the system’s development
and extension of its design and addition of new functionalities. Secondly, the system
is a rather complex technical solution that integrates several other systems and has
plenty of different deployment and configuration scenarios. A user manual that
describes these options is important for the system to be usable.
Realizability and impact were not considered important for the design of the system.
Realizability was not important since the technologies that would compose the
system were known to work. The feasibility of the system was not really in question.
Impact was not considered important as the system was not expected to have any
economical or societal impact or contribution.
8.3 Future development
The NRS system forms a basis that can be extended with various advanced
functionalities. The NRS core consists of its topology representation and the device
plugins concept; these can be viewed as a network resource management platform on
which more advanced or ‘clever’ operations can be automated.
There are several advanced functionalities included in the original inception of the
system, which are also mentioned in the Requirements chapter as factors to take into
account when designing the system. In addition, the system would benefit from
features that make it more ‘production’ oriented. These include:
Automatic network discovery:
NRS could make use of a network discovery tool or mechanism that would
initialize and update the topology automatically. This is an important
addition to make the NRS system more responsive to changes in its
environment, therefore more appropriate for production environmens
QoS provisioning:
The NRS system can also be further developed to introduce the concept of
network features, such as bandwidth, and the reservation of those. Such an
addition would require extensions of the system in four different levels:
1. User requests
A new class of user requests that include bandwidth guarantees
should be available to the system
2. Topology graph and algorithms.
Bandwidth capacity would need to be associated with Links and
Network Nodes in the topology.
Algorithms would have to be modified to take the bandwidth
capacity of links and network nodes into account, as they traverse
the topology graph to find paths.
3. Reservations
Reservation of requests should take into account and modify the
bandwidth capacity available to nodes and links, in a similar
fashion to how the runtime VLANs of devices are manipulated.
Conclusions 103
4. Device plugins
The device plugins need to be extended to be able to configure QoS
on network devices, such as bit-rate limiting. Proper invocation of
the device plugins would need to be introduced to the allocation
step, so that the bandwidth reserved in a previous reservation step is
correctly translated to device configuration.
Interface with a VM scheduler (also see Appendix B):
The NRS system can be used to influence VM scheduling decisions, with
network characteristics playing a decisive role in the choice of VM hosts by
the scheduler.
Elaborate algorithms:
The algorithms used for the NRS prototype are simple path-finding
algorithms. It may be desirable to replace them with ones that take the
realities of a specific topology into account, e.g., associate a L2 network
with a specific node in the topology and always force it to go over the node.
As another example, more complex algorithms can implement load
balancing strategies on the local topology.
Administrator web interface:
As an alternative to the CLI admin shell, a web interface would be suitable
to convey visual information on the network topology, the L2 networks, and
their underlying graphs.
Scaling the server implementation:
The synchronous server communication may need to be replaced with an
asynchronous event-driven implementation that can provide scalability when
it comes to multiple simultaneous requests.
System consistency:
To safely recover from crashes, the system would benefit from the usage of
a database to save the NRS state. At the moment the system state is saved in
local storage in the form of serialized Python objects that contain state
information. The state of the system is not guaranteed if something goes
wrong while the state is being saved. Saving them in a database instead can
provide safe data recovery. A NoSQL database can be used that can directly
store XML files. The serialized objects would need to be converted to an
XML representation as well.
Project Management 104
9 Project Management This chapter describes the software development process used for the development of
the system and the accompanying deliverables. It presents the project’s timeline,
major milestones and their completion, and risk assessment and mitigation.
9.1 Management Process
The NRS project management process is based on the Rational Unified Process
(RUP) [23] [24] framework. RUP is an iterative software development process that
divides the development effort in phases with iterations. It has similarities with
iterative waterfall models. The four phases that it defines are:
Inception phase: Understand what to build, identify key system
functionalities and identify risks.
Elaboration phase: Refine requirements, design and implement a skeleton
architecture
Construction phase: Iteratively develop a complete product.
Transition phase: Beta test and prepare for deployment.
The RUP process in general is a very detailed development process that defines a lot
of different artifacts, human roles, activities etc. The NRS project did not follow the
RUP process by the book, but adapted it to the needs of the development of the
prototype. The important phases for the development of a prototype are the Inception
and Elaboration phases. Construction is important as well in order to build a full
product, however parts of it may be omitted depending on the desired extent of the
prototype’s functionality. The Transition phase is not particularly relevant for
prototype development.
The development process and status of the project was managed and monitored for
the duration of the project with the use of two tools: a project management document
and a project management web application57
. The RUP phases were divided in
iterations. When the project started, the project timeline was divided among the
different perceived future iterations. For each upcoming iteration, an attempt was
made to detail it into tasks and accurately estimate its duration. The online tool was
used to keep track of detailed tasks of each iteration. After an iteration was over, the
Project Management document was updated with the results, and the next iteration
was detailed into separate tasks that were in turn monitored in detail online.
The project was assisted by Project Steering Group meetings, monthly meetings that
included the project’s supervisor. The meetings were most often arranged to coincide
with the completion of iterations, in order to present the outcome of the previous
iteration and to discuss the following iteration’s purpose and expected outcome.
9.2 Project Milestones
The completion of RUP phase iterations marked the milestones specified for this
project. Each iteration was related to a specific research or development activity that
produced tangible results in the form of deliverables, such as a design document, a
domain research document or source code that implements prototype functionality.
The implementation of a various system requirements were split among the different
iterations. The iterations specified for the project are shown in Table 19.
57 http://www.zoho.com/projects/
Project Management 105
Table 19: NRS project milestones
Iteration Activities Motivation
Inception phase
Inception Iteration 1 Produce documents:
System Concept/Vision
Basic use-cases
Description of system
logic and interfaces
Requirements
Identify of the system’s key
functionality, expressed in
use-cases and requirements.
Gain familiarity with
domain entities such as
network switches.
Identify main risks.
Elaboration phase
Elaboration Iteration 1:
First intra-cloud prototype
implementation
Produce documents:
Basic architecture
Refined requirements
Implement:
Initial prototype with basic
intra-cloud functionality
Show feasibility of the
critical subset of the intra-
cloud functionality of the
system, as envisioned in the
Inception phase.
Identify the basic building
blocks of the system’s
architecture and the basic
interfaces of the system
Refine requirements.
Identify additional risks.
Elaboration Iteration 2:
First inter-cloud prototype
implementation
Extend prototype with basic
inter-cloud functionality
Produce documents:
Refined architecture
Refined system interfaces
Show feasibility of the
critical inter-cloud
functionality of the system,
as envisioned in the
Inception phase.
Refine the system’s
architecture and interfaces.
Construction phase
Construction Iteration 1:
Full-fledged service
implementation
Turn the prototype into a full-
fledged service:
turn it into a daemon
process
provide a CLI tool for
controlling it
streamline logging
Produce usage documentation.
Turn the prototype into a
real service, i.e., a daemon
process with listening
servers
Provide typical accessibility
features for the service (CLI
tool, logging)
Construction Iteration 2:
Finalize Topology
Extend the logic topology:
Finalize the form of the
logic topology of the
system.
Fully implement grow
and shrink algorithms,
reserve and release
operations
Check for consistency
Document topology and
algorithms.
Fully develop the topology
so that it can support all
use-cases (requests for
reserving, allocating and
releasing network interfaces
from networks)
Project Management 106
Construction Iteration 3:
Device plugins
Test several switch models
Finalize device plugin
programming interface
Explore as many switch
models from different
vendors as possible, to
ensure the validity of the
device plugin interface
Construction Iteration 4:
Administrator access
Extend the prototype with and
administrator access tool to
view and modify the topology
Introduce the administrator
actions to the system
Construction Iteration 5:
Network resources
(bandwidth)
Add bandwidth capacity to the
topology nodes
Modify algorithms to
accommodate for bandwidth
requests
Add device plugin actions that
can enforce bandwidth
requirements
Enrich the system with
support for provisioning
requests that contain
bandwidth
Initially, estimations were made for the amount of time each iteration would need.
The estimations were placed in the project timeline as shown in Figure 62.
The initial estimations were generous; the project for NRS had an investigative
nature, therefore the estimations were to change after each milestone was reached.
The milestone trend analysis chart showed in Figure 63 shows how the completion of
each milestone actually varied throughout the duration of the project. We can identify
two main deviations from the initial plan:
1) It was chosen to complete the milestones of ‘Service Implementation’ and
‘Topology Iteration’ before the ‘first inter-cloud prototype’. This decision
was made due to the fact that these two milestones would provide a fully-
working intra-cloud prototype. Without their completion, the prototype
could not have been considered functional. If something would go wrong
later during the project, a functional prototype that does not fulfill all the
requirements would be better than a non-functional one that only has the
prospect of fulfilling all the requirements.
2) The ‘network resources (bandwidth)’ milestone was dropped from the
project plan. As the project progressed and the topology, algorithms and the
system architecture became clearer, it became apparent that implementing
bandwidth requests would be an endeavor that was not possible to fulfill for
the duration of the project. The ‘bandwidth’ iteration was dropped from the
project in early May.
The real project timeline, which depicts the completion of milestones as actually
happened during the project, is shown in Figure 64.
Figure 62: Initial project timeline
Project Management 107
Figure 63: Milestone trend analysis
9.3 Risk Management
The purpose of the first iterations of the project was, among other things, to identify
possible risks, assess their impact, and dictate courses of action to avoid them. The
risks identified in the first iterations of the project are outlined in Table 20. The left
column of the table describes the courses of action taken to avoid certain risks. In
general, these actions influenced the development approach of the project, e.g.,
modified the order of iterations as discussed in section 9.2
40913
40941
40969
40997
41025
41053
41081
41109
41137
41165
41193report2012/09/15
tests2012/08/10
bandwidth2012/07/30
adminaccess2012/07/10deviceplugins2012/06/20topology2012/06/05
service2012/05/15
First Inter-cloud2012/04/30First intra-cloud2012/03/30Inception2012/02/20
Project Start2012/01/05
Figure 64: Actual project timeline
Project Management 108
Table 20: Risk management
Risk Description Mitigation strategy
Inception Phase identified risks
Feasibility of technical
design
NRS is close to the
hardware. The technical
design is constrained by the
realities of hardware, the
protocols, the TCP/IP layer
stack, existing cloud
platform capabilities etc. If
the initial design omits to
take one of these into
account, it can be invalid
Talk to experts. Create
proofs of concept often and
continuously.
Feasibility of implementing
decision making on network
topologies
There is no implementation
of NML. Could be
complicated/immature to
use.
The problem of matching
network topologies will be
hard to solve.
Create modular architecture
so that more complex
algorithms/strategies can be
inserted in the future. Start
with simple network
allocation use-cases that
have simple logic.
Find experts in graph
theory.
Security risk The software under
development lies in the
cloud networking domain,
which is security-sensitive
and attack-prone. The
proposed solution may have
unexpected security flaws.
Talk with experts early to
avoid starting off with a
fundamentally wrong
solution.
Elaboration phase identified risks
Requirement may be left out
of the final product
Bandwidth, QoS etc.
provisioning is pushed
closer to the project's end.
Low impact risk. Inter-
cloud connectivity is more
important that bandwidth.
The topology model is
being changed in every
phase
The code-base that relies on
the topology is increasing.
Topology changes will be
needed for introducing
inter-cloud connections, as
well as later when
introducing bandwidth and
other network features.
Try to decouple inter-cloud
connection and other
modules as much as
possible from the topology,
so that they are re-usable
even if the topology has to
be created from scratch
Bibliography 109
Bibliography
References
[1] "RFC 11157: A Simple Network Management Protocol (SNMP)," 1990.
[2] "bridge - The Linux Foundation," [Online]. Available:
http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge.
[3] "Open vSwitch - An Open Virtual Switch," [Online]. Available:
http://openvswitch.org/.
[4] "Amazon Elastic Compute Cloud," [Online]. Available:
http://aws.amazon.com/ec2/.
[5] "Open Cloud Computing Interface," [Online]. Available: http://occi-wg.org/.
[6] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridge Local Area
Networks," [Online]. Available:
http://standards.ieee.org/getieee802/download/802.1Q-2011.pdf.
[7] IEEE, "802.1ad - Provider Bridges," [Online]. Available:
http://www.ieee802.org/1/pages/802.1ad.html.
[8] "Multiprotocol Label Switching IETF Working Group," [Online]. Available:
http://datatracker.ietf.org/wg/mpls/charter/.
[9] "Virtual Extensible LAN Internet Draft," [Online]. Available:
https://datatracker.ietf.org/doc/draft-mahalingam-dutt-dcops-vxlan/.
[10] Arista Networks, "VXLAN: Scaling Data Center Capacity," [Online]. Available:
http://www.aristanetworks.com/en/solutions/whitepapers.
[11] "OpenStack," [Online]. Available: http://openstack.org.
[12] "OpenNebula," [Online]. Available: http://www.opennebula.org.
[13] "Eucalyptus," [Online]. Available: http://www.eucalyptus.com.
[14] IEEE, "802.1Qbg - Edge Virtual Bridging," [Online]. Available:
http://www.ieee802.org/1/files/private/bg-drafts/d2/802-1qbg-d2-2.pdf.
[15] IEEE, "802.1ak - Multiple Registration Protocol," [Online]. Available:
http://www.ieee802.org/1/pages/802.1ak.html.
[16] "RFC 3535: Overview of the 2002 IAB Network Management Workshop,"
2003.
[17] P. Grosso, A. Brown, A. Cedeyn, F. Dijkstra and J. v.d. Ham, "Network Markup
Language - Context," [Online]. Available: https://forge.ogf.org/sf/go/doc14679.
[18] P. Grosso, A. Brown, A. Cedeyn, F. Dijkstra and J. v.d. Ham, "Network Markup
Language - Base Schema," [Online]. Available:
https://forge.ogf.org/sf/go/doc15674.
[19] E. Gamma, R. Helm, R. Johnson and J. Vlissides, Design Patterns: Elements of
Reusable Object-Oriented Software, 1994.
[20] M. Fowler, "FluentInterface," [Online]. Available:
http://martinfowler.com/bliki/FluentInterface.html.
[21] R. Ahuja, T. Magnanti and J. Orlin, Network Flows: Theory, Algorithms and
Bibliography 110
Applications, Prentice Hall, 1993.
[22] "networkX graph library," [Online]. Available: http://networkx.lanl.gov/.
[23] P. K. Per Kroll, The Rational Unified Process Made Easy: A Practitioner's Guide
to the RUP.
[24] P. Kruchten, The Rational Unified Process.
[25] J. J. Keijser, OpenVPN 2 Cookbook, 2011.
[26] OMG, "Unified Modeling Language, Superstructure 2.4," [Online]. Available:
http://www.omg.org/spec/UML/2.4.1/.
[27] R. Buyya, J. Broberg and A. Goscinski, Cloud Computing - Principles and
Paradigms, 2011.
[28] B. Panzer-Steindel, "Overview of the Computing Fabric at CERN," [Online].
Available: http://lcg-computing-fabric.web.cern.ch/lcg-computing-
fabric/fabric_presentations/overview_docs/.
[29] Brocade, "Brocade Ip Primer," [Online]. Available:
http://community.brocade.com/docs/DOC-1826.
[30] SURFnet, "SURFlightpaths," [Online]. Available:
http://www.surfnet.nl/en/hybride_netwerk/surflichtpaden/pages/lichtpaden.aspx.
Glossary 111
Appendix A. Glossary NRS lies in the network domain of Infrastructure-as-a-Service cloud computing,
which encompasses a variety of domain specific concepts. The definitions of the
concepts related to this project are contained in the tables of this chapter. The
information is presented as follows: On the left column is the concept’s term and on
the right column is the description. If a term has synonyms, they are included in
parenthesis below the term that is chosen to represent the concept throughout the
document. You can find definitions for IaaS cloud concepts in Table 21, for some
virtualization concepts in Table 22, and for networking in Table 23.
Table 21: IaaS Cloud terminology
Term Description
Cloud Computing The delivery of computing as a service, provided over a
network (e.g., the Internet). Three different fundamental
cloud computing services can be provided:
an application: The service model then is called
Software-as-a-Service (SaaS)
a computing platform: Platform-as-a-Service (PaaS)
raw resources (cpu, block storage, network):
Infrastructure-as-a-Service (IaaS)
The term “Cloud computing” most often refers to the IaaS
service model of cloud computing It is the service model
that the term will refer to in this document.
Infrastructure-as-a-
Service (IaaS)
The most basic cloud model service, where the resources
offered are computers (virtual machines), storage space and
network.
Platform-as-a-Service
(PaaS)
A cloud model service where the user is offered a
computing platform, typically an OS with a language
execution environment that can be used to run programs or
submit jobs.
Software-as-a-Service
(SaaS)
A cloud model service where access to an application is
provided over a thin client, often a web browser.
Network-as-a-Service
(NaaS)
The delivery of network features (such as firewalls, load
balancers) as a service over a network. Usually refers to
delivery of such network features to IaaS resources.
IaaS Cloud
(or simply cloud)
A cloud infrastructure, typically deployed in one or more
computer clusters, that provides IaaS. An IaaS cloud has
“users”, which request and receive virtual machines or other
resources that the cloud provides.
Public Cloud A cloud deployment where resources are available to the
general public, either for free or on a pay-per-use model.
The cloud infrastructure is of no concern to the user.
Private Cloud A cloud deployment where the cloud infrastructure is
operated for a specific organization and the cloud resources
are available to its users. The cloud infrastructure is set up
by/for this specific organization.
Glossary 112
Cloud platform
(Cloud stack,
Cloud resource
manager)
A software platform that implements an IaaS cloud on a
computer cluster. It controls and provisions resources in an
IaaS cloud. It can provide several interfaces and APIs
through which it services requests.
The cloud platform typically requires one or more machines
to run its components, and a set of machines to act as virtual
machine hosts.
A few major open-source cloud platforms: OpenStack,
OpenNebula, Eucalyptus.
Cloud site Refers to the infrastructure that hosts an IaaS cloud and the
service itself as a whole. A cloud site is located in one data-
center.
Computer cluster A group of computer servers that typically are used to offer
a specific functionality, e.g., to host an IaaS cloud. They are
connected to each other via switches.
Switching Infrastructure
(of a data-center or
computer cluster)
The network switches that interconnect the computers of a
data-center or computer cluster.
Data-center A facility that hosts one or more computer clusters.
Management Network A data-center typically has a separate network used to
manage network devices and computers. This network is
physically separated from the production network.
Production Network The network infrastructure used by cloud users, when they
initiate network traffic. Separated from the management
network for security reasons.
Multi-tenant Network A data center network that is logically divided into smaller,
isolated networks. They share the physical networking
infrastructure, but operate without any visibility to each
other.
Hardware Virtualization
Table 22: Virtualization terminology
Term Description
Hardware Virtualization A technology that allows to create virtual hardware
platforms within an operating system. These act as a real
computer (a virtual machine): they can be used to install and
operate a new operating system.
Virtual Machine (VM) An isolated operating system that is hosted within another
operating system (guest OS and host OS respectively).
Hypervisor Software that implements hardware virtualization and can
run multiple operating systems (guests) on a host computer.
Virtual Machine Host
(Virtualization Host,
Hypervisor Host)
A computer that runs a hypervisor and is used as the host
operating system for VMs.
Glossary 113
Networking
Table 23: Networking terminology
Term Description
Computer Network The interconnection of computers by communication
channels that allow exchange of data. The exchange may be
assisted by special hardware devices.
Network Host A computer connected to a computer network.
OSI Model A (theoretical) protocol stack that attempts to abstract a
communications system into different layers. Each layer is
responsible for handling a specific problem and exchanges
data only with its neighboring layers. Consists of:
1. Physical layer
2. Data link layer
3. Network layer
4. Transport layer
5. Session layer
6. Presentation layer
7. Application layer
TCP/IP Protocol Stack The most popular protocol stack to implement networking.
It shares many similarities with the OSI model, but it is not
identical. The TCP/IP stack consists of layers. Exchanged
information will go from the bottom layer to the top to reach
the user, and vice versa to be sent to a different machine.
Each layer will add additional information to the unit of
exchange.
The stack consists of:
Link layer: contains communication technologies for a
local network (Ethernet)
Internet layer: connects local networks (IP)
Transport layer: handles host-to-host communication
(TCP)
Application layer: handles process-to-process data
exchange (HTTP)
The link layer data is sent over a physical medium to its
destination (e.g., a cable). The physical medium (physical
layer) is decoupled from the TCP/IP stack and has many
different implementations.
Ethernet The most common technology for local area networks.
Divides a stream of data into frames, which contain source
and destination addresses.
Glossary 114
Ethernet broadcast
domain
(L2 broadcast domain,
Ethernet segment)
When Ethernet frames need to be sent to a destination, they
will be broadcast over the physical medium (e.g., a cable) to
all reachable devices. That is, every device that is
“connected” to the medium will receive the frames. This
collection of reachable targets is called an Ethernet
broadcast domain (or layer 2 broadcast domain, from OSI
layers).
Ethernet bridging A frame forwarding technique, used by network devices. It
allows connecting multiple Ethernet broadcast domains into
one.
Local Area Network
(LAN)
Represents a network that connects computers in a limited
area. May be used to refer to an Ethernet segment.
Gateway A network point (host or other device) that acts as an
entrance to another network. In the context of Ethernet, a
gateway is an entrance to a different Ethernet segment, or in
the context of LANs, to a different LAN.
Network Interface (IF,
NIC)
(Network Interface
Controller,
Physical Network
Interface)
Computer hardware that connects a computer to an Ethernet
domain and allows exchange of Ethernet traffic. It may
implement processing of the TCP/IP stack in hardware.
Each network interface has a unique 48-bit number, the
MAC address.
The term physical network interface is used to differentiate
from a virtual network interface.
Network Switch
(switch)
A computer network device that connects network devices
or segments. All devices connected to its ports are put in the
same Ethernet broadcast domain.
Different switch models can have very different and
advanced functionalities (QoS). All production switches
support 802.1q VLANs.
Switch Port Switches have a set of ports, which have a similar function
with network interfaces: to exchange Ethernet traffic. Unlike
network interfaces, a switch has no need to further process
the TCP/IP stack.
Not to be confused with TCP ports.
Switch Backplane,
Switching Fabric
Internally, the switch forwards frames to different ports
using specialized hardware. This is called the switch
backplane, and its implementation determines the
forwarding rate of the switch.
802.1Q VLANs Virtual LAN is a networking standard that supports
separation of Ethernet broadcast domains, while using the
same physical medium. This is achieved using a special
header in the Ethernet frames. The header specifies in which
VLAN the frame belongs. The network devices that handle
the frames use this information to forward the frame
correctly, which effectively separates the LAN into VLANs.
The VLAN specification supports up to 4094 different
VLANs.
Glossary 115
VLAN access port A port can be assigned to a single VLAN. It is then called
an “access port”, as connecting a device to it will give the
device access to that particular VLAN.
VLAN trunk port A port can be configured to allow multiple VLANs. This is
used when traffic from multiple VLANs has to be passed
over one port, for example when forwarding those VLANs
to a different switch over that specific port. These ports are
called trunk ports.
Quality of Service (QoS) Technology that allows to differentiate between types of
traffic and allow transport of it with special requirements
(e.g., being able to identify traffic for a real-time application
and to guarantee low latency for it)
Network Access Control
List (ACL)
A list of rules that can be applied to ports of a network
device to control outbound and inbound traffic (similar to
firewalls in this context).
Virtual Network
Interface (VIF, vNIC)
Virtual representation of a network interface that
corresponds directly to a physical network interface. Used
by Virtual Machines to connect to networks.
Virtual Machine
Gateway
Term used to describe the device that operates as the
“gateway” of the Virtual Machines to a network lying
outside the VM host. The term may differ from the regular
“gateway” term, since the network up to the VMs may
already be a single Ethernet broadcast domain.
Linux Bridge Software that runs on a Linux host and can “bridge” two or
more network interfaces of the host. All network traffic
destined to one network interface that is a member of a
bridge will be copied to all the other interfaces that are
members of the bridge. This allows to bridge different
Ethernet segments, if these segments are accessible via
different network interfaces of the host.
Virtual Switch Software that simulates the behavior of a hardware switch
for a host’s network interfaces (corresponding to the
hardware switch’s ports). Typically a software switch
supports much more than Ethernet bridging, with features
commonly found on hardware switches (e.g., QoS, access
lists)
NRS and VM scheduling 116
Appendix B. NRS and VM
scheduling This appendix discusses the subject of Virtual Machine Scheduling that takes
networking capacity58
into account. In other words, the problem of choosing the
proper Host for a Virtual Machine, so that the Virtual Machine’s constraints on CPU,
Memory, and Network capacities are satisfied. The description of the NRS system’s
functionality assumes that this decision is taken outside of NRS. However, a system
like NRS that maintains a network topology is suitable to provide input to the VM
scheduling process. This immediately raises the question of why the NRS prototype
is not integrated with a VM scheduler, i.e. instruct the cloud platform on which hosts
to deploy requested Virtual Machines? This chapter attempts to describe the problem
and show that it was not feasible to tackle for the duration of the project.
OpenNebula’s VM scheduler
OpenNebula employs a custom VM scheduler. It allows the user to define
requirements for his VM, in the form of a CPU capacity and a Memory capacity
(these are hard constraints). CPU is defined in the form of a ratio, where 0.5 is half a
real CPU, and Memory is defined in MBytes. For each VM host, OpenNebula
maintains information on current CPU and Memory utilization using the above
metrics. In addition, the scheduler can have a Policy; this defines an order of
preference in the available hosts, and can have many forms. The policy plays the role
of soft constraints. For example, a policy is to try to choose hosts that have utilized
CPU less than 0.4. A different policy can be to choose hosts so that VMs are as
spread over hosts as much as possible. The policy associates a Rank for each Host
that specifies how well it suits the preference. The scheduler will work roughly as
follows to satisfy VM requests:
Iterate on all the Virtual Machines pending for launch:
1) Satisfy hard constraint: Filter out hosts whose current CPU and Memory
utilization cannot satisfy the requirements.
2) Satisfy soft constraint: Among the remaining hosts, choose the one that has
the highest Rank.
It is important to note that the Rank is one number associated with only one host that
uniquely quantifies how well-suited the host is for the specific VM’s requirements.
Trying to apply this idea when introducing network capacity appears to me to be
problematic.
A metric for network capacity?
Unlike CPU and memory capacity that are numbers associated with one host only and
independent of other hosts, a metric for network capacity does not easily make sense
as a standalone and independent number for a host. Network capacity by definition
exists between more than one hosts, and the capacity of a host can be only defined in
relation to other hosts that the host can be connected to.
From the scheduler’s perspective however, it would be convenient if a single number
can be used to judge the Rank of a host when it comes to how well it satisfies a
request on network capacity, so that choosing the highest ranking hosts will result in
the network between them being as close to the network capacity request as possible.
We will try to come up with such a number in the following example.
58 aka bandwidth
NRS and VM scheduling 117
Network Metric Example
We assume three VM hosts available for VM scheduling (Figure 65). They are
connected to each other over a network whose details we do not care about (assume
unlimited bandwidth). All connections are full-duplex. Host 1 is connected to the
network with a 1 Gb link, Host 2 with 500 Mb and Host 3 with 100 Mb.
We also assume that the total bandwidth available to VMs within the same host is up
to 1Gb (Figure 66, the sum of the links is not allowed to exceed 1Gb). If there is 1
VM in the host, it can connect to ‘out there’ with 1Gb (1 link total). If there are 2
VMs and the bandwidth is split equally, they get 0.33 (1/3) Gb each (3 links total:
VM1 to VM2, VM1 to outer network, VM2 to outer network). If there are 3 VMs,
they get 0.17 (1/6) Gb each (6 links: VM1 to VM2, VM1 to VM3, VM2 to VM3 and
three links going out). The equation that calculates this ratio is
) for number of
VMs.
Now we assume the setup of Figure 65 with no VMs active. A user requests two VMs
and asks for 1 Gb capacity among all of them. The following table shows the possible
combinations and the percentage of the requested capacity that becomes available.
Figure 65: Three VM hosts connected to each other
Figure 66: Network capacity inside a VM host
NRS and VM scheduling 118
Table 24: Links between VMs compared to 1Gb requested capacity
Chosen host pair for the VMs Link between the
VMs
Percentage of
bandwidth satisfied
(Host1,Host1),(Host2, Host2) or (Host3,
Host3)
1Gb 1
(Host1, Host2) 500Mb 0.5
(Host1, Host3) 100Mb 0.1
(Host2, Host3) 100Mb 0.1
We will try to turn this information into a ‘Rank’ that is exclusive to each host by
giving each host the bandwidth percentage that is the highest among all pairs that the
host is involved in. The result of this is shown in Table 25.
Table 25: Rank based on highest value among pairs
Host Rank
Host1 1
Host2 1
Host3 1
Of course this is not correct, because the scheduler sees all hosts as being equally
good, and it may choose the pair (Host1, Host3) which ends up in 100Mb capacity
available. The same problem appears if we rank hosts based on the lowest percentage
among all pairs that the host is involved in, shown in Table 26.
Table 26: Rank based on lowest value among pairs
Host Rank
Host1 0.1
Host2 0.1
Host3 0.1
It seems like an algorithm that ranks hosts will need to selectively choose to rank
some hosts highly, and rank others low leaving some valid options out, such as the
results in Table 27.
Table 27: Rank based on choosing one among the pairs that works
Host Rank
Host1 1
Host2 0.5
Host3 0.1
This ranking leaves the option (Host3, Host3) out of the question, however there is no
apparent way to do this differently. (Host1,Host1) gets the highest rank, while
(Host1,Host2) is lower but still in the range of acceptable. Apart from (Host3,Host3)
the result of this ranking method corresponds to the network reality. This network
rank can be used in conjunction with the CPU and Memory rankings for the scheduler
to come up with a balanced decision.
Retrieving the network capacity information
Apart from trying to reduce the capacity values of host tuples into one scalar value
NRS and VM scheduling 119
per host, the capacity associated with each grouping of VMs has to be retrieved in the
first place. There are different combinations of putting VMs into
hosts. For each of these, some information needs to be calculated that determines how
well the VM-to-hosts combination complies with the requested network capacity. A
brute-force algorithm that goes through all of them has exponential complexity (NP).
However the assumption that each Vms-to-hosts mapping is associated with one
capacity value is naive. In the internal network topology, each of these mappings can
be implemented in not one but multiple different ways. This is due to the fact there
may be multiple different ways to implement a connection between two hosts, a
number in the degree of the number of internal nodes that exist between the two
hosts. So in reality, not only are there different combinations, but
each of them has )
different network paths that
implement it, each of which may have different capacities. is the number
of different internal nodes (essentially paths) through which the hosts can be
connected, and the power is the number of different link between hosts that are
distributed over these nodes (same equation with when calculating VM links
previously).
Open Questions
The situation described in the sections above raises the following topics:
1) Research on schedulers: Has a “distributed” resource like network capacity
been modeled before to use in scheduling? Is the OpenNebula scheduler a
representative sample?
2) Is the mapping of the capacity value of host pairs to a ‘Rank’ scalar value for
one host even necessary from the scheduler’s perspective, or is there another
way around it?
3) It appears that to be able to obtain network capacity information, you need a
graph representation of the network topology and the ability to operate on it
with algorithms. The NRS system, as created for this project, is of great
value for this, although the network topology representation would need
some further research.
4) Brute-forcing the way to retrieve network information each time a VM is
requested may lead to exponential explosion. Different graph algorithms
may exist, or may have to be devised to deal with the problem in a domain-
specific way (for example algorithms using heuristics combined with the
networking infrastructure realities: you may want half of your connections to
always go through a specific internal node). A ‘dumb’ algorithm may prove
to be a better solution eventually.
5) The example used serves a simple request: 1Gb capacity among all VMs.
More intricate topologies may be requested (such as: 1 ‘server’ VM with
1Gb to 3 ‘client’ VMs, and 100Mb among the ‘client’ VMs), which
influences the algorithm for finding paths and ranking hosts.
6) All the above deal with figuring out the best paths and translating them to
host rankings. Actually enforcing the network capacities (QoS) is a different
topic that involves ability to configure devices (the NRS system, as created
for this project, is of great value for this as well).
Answers to these questions are well outside the scope of a 9-month project. In
addition, these questions deal mostly with the logic of representing networks and
operating on the logical representation. Dynamically enforcing the decisions taken on
the logical level, i.e. translating them into proper device configuration is a whole
different venture, and it was a big part of the NRS project.
NRS CLI and configuration 120
Appendix C. NRS CLI and
configuration This appendix includes parts of the NRS CLI tool documentation, and a sample
configuration file. A more detailed user manual is available at
http://wiki.nikhef.nl/grid/NRS
CLI main usage:
usage: nrs_client.py [-h] [--version]
{status,vif,inter-cloud,allocate,release,reserve,connect}
...
Send action requests to the NRS daemon.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
commands:
{status,vif,inter-cloud,gateway,allocate,release,reserve,connect}
command help:
connect connect a network interface to a network
reserve reserve a network connection to a network interface
allocate allocate a network reservation
release release an interface from a network
vif connect/disconnect a new virtual network interface
inter-cloud start/stop an inter-cloud connection
gateway connect/disconnect to a gateway
status print status information
CLI ‘connect’ command:
usage: nrs_client.py connect [-h] -host NAME -if NAME -nid ID [-vlan ID]
Connect a network interface to a network.
optional arguments:
-h, --help show this help message and exit
network interface:
-host NAME, --host NAME
IP resolvable name of the interface's host
-if NAME, --interface NAME
name of the network interface, e.g eth0
NRS CLI and configuration 121
network info:
-nid ID, --network_id ID
network identifier for the network that the interface
will be connected to
-vlan ID, --vlan_id ID
VLAN identifier to associate the network with
CLI ‘inter-cloud start’ command:
usage: nrs_client.py inter-cloud start [-h] -addr ADDR -port PORT -type TYPE
-local_id ID -remote_id ID
Start an inter-cloud connection. Will attempt to bridge a local to a remote
network over an inter-cloud connection.
optional arguments:
-h, --help show this help message and exit
-addr ADDR, --remote_service_address ADDR
address of the remote service that the inter-cloud
connection will be negotiated with
-port PORT, --remote_service_port PORT
port of the remote service
-type TYPE, --connection_type TYPE
inter-cloud connection type
-local_id ID, --local_network_id ID
local network identifier
-remote_id ID, --remote_network_id ID
remote network identifier
CLI ‘status’ command:
usage: nrs_client.py status [-h]
{reservation,all,vlan,network,inter-cloud} ...
Print status information.
optional arguments:
-h, --help show this help message and exit
status arguments:
{reservation,all,vlan,network,inter-cloud}
status help:
all print all status information
reservation print reservation information
vlan print vlan information
network print network information
NRS CLI and configuration 122
inter-cloud print inter-cloud information
NRS sample configuration file:
#nrsd.conf [ ] General#Whether to enable inter-cloud functionality.
#default = Trueenable_inter-cloud = True #Whether to enable intra-cloud functionality.
#default = Trueenable_intra-cloud = False #enable_telnet = True #draw_graphs = True #state_directory = . #image_directory = .
#log_directory = .
#Telnet server for administrative access[ ] Telnethost = localhost port = 50011
#main service requests[ ] Main Serviceport = 50008host = 10.80.80.32 [ ] Inter-cloud
#host for inter-cloud communication with remote servicesport = 50009host = 82.180.120.132 #TLS/SSL #priv_key = certs/keyfile #cert = certs/certfile
#ca_certs = certs/ca_certs_file #Default behavior: all networks allowed. #specify if allow or deny has higher priority #rule_priority = allow #If allow networks is not specified, all networks are allowed. #If it is specified, only the specified networks are allowed. allow_networks = 1,2 #Deny inter-cloud to networks. #If not specified, all networks are allowed.
#deny_networks = 1,2,3connection method = VPN [ ] VPN
#the public ip address of the vpn boxvpn-box_address = 82.180.120.133#Whether this NRS can be the VPN server.
#Valid options: must, may, noserver capability = must allow_networks = 3#deny_networks = 1,2,3
123
About the Author
Dimitris Theodorou received his Diploma in Computer
Engineering and Informatics from the University of
Patras, Greece in 2009. During his studies he specialized
in Computer Science and Software Engineering,
developing interests in areas such as algorithm analysis
and implementation, mathematical logic, graph and game
theory. His diploma thesis "Extension of PNYKA e-
voting system to combat malicious attacks" has been
performed in the Greek research institute CTI59
, and
involved extending the e-voting system’s client to
enhance security features.
In 2010, Dimitris joined the Software Technology PDEng
Program in the Technical University of Eindhoven,
stepping into architectural software design and object-
oriented analysis. His final project was performed at
Nikhef and entailed designing and implementing a system
for network provisioning in cloud infrastructures60
. The
system integrates cloud platforms with network hardware
configuration and network topology modeling. Dimitris
received his PDEng degree on October 2012.
59 http://www.cti.gr/en/ , http://www.pnyka.cti.gr/indexEn.php 60 https://wiki.nikhef.nl/grid/NRS