network provisioning in iaas clouds - tu/e · pdf filegrid computing community is...

Network provisioning in IaaS clouds

Theodorou, D.; Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. SoftwareTechnology (ST)

Published: 01/01/2012

Document VersionPublisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differencesbetween the submitted version and the official published version of record. People interested in the research are advised to contact theauthor for the final version of the publication, or visit the DOI to the publisher's website.• The final author version and the galley proof are versions of the publication after peer review.• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

Citation for published version (APA):Theodorou, D., & Technische Universiteit Eindhoven (TUE). Stan Ackermans Instituut. Software Technology(ST) (2012). Network provisioning in IaaS clouds: a network resource management system Eindhoven:Technische Universiteit Eindhoven

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ?

Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Download date: 22. Apr. 2018

https://research.tue.nl/en/publications/network-provisioning-in-iaas-clouds(9185e455-a7a0-43de-b10b-10117ab05c01).html

Network Provisioning in IaaS

Clouds

Dimitrios Theodorou September 2012

Network Provisioning in IaaS

Clouds

Dimitrios Theodorou

September 2012

Network Provisioning in IaaS Clouds A Network Resource Management System

Eindhoven University of Technology

Stan Ackermans Institute / Software Technology

Partners

Nikhef Eindhoven University of Technology

Steering Group J. Templon

J.J. Keijser

T. Suerink

R.H. Mak

Date September 2012

Contact

Address

Eindhoven University of Technology

Department of Mathematics and Computer Science

HG 6.57, P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands

+31402474334

Published by Eindhoven University of Technology

Stan Ackermans Institute

Printed by Eindhoven University of Technology

UniversiteitsDrukkerij

ISBN 978-90-444-1169-0

Abstract A network provisioning system for cloud infrastructures has been developed at Nikhef. The

system operates on the network topology of the cloud site and can create Virtual Networks

on top of it, by connecting network interfaces in isolated Ethernet segments. The system

can also extend Virtual Networks over data-center interconnections, or bridge them with

external networks (the Internet). This functionality is achieved by the system’s being able

to: 1) maintain a logical network topology graph and map connectivity requests onto it, 2)

configure network hardware, 3) negotiate cross-data-center connections with a duplicate of

itself running on the remote site, and 4) manage network isolation mechanisms (such as

802.1Q VLANs). The system was built mainly as a service to cloud platforms, and a plugin

that integrates it with OpenNebula was created for this purpose. The system also provides

administrative access that allows monitoring and modifying the topology and VLAN

allocation. Furthermore, 1) and 2) provide a foundation for further development to support

more advanced network configurations (e.g., QoS).

Keywords Cloud Computing, Infrastructure-as-a-Service, IaaS, Cloud platform, Network

Virtualization, Network Resources, Virtual Networks, VM-aware networking, Software-

defined Networking, SDN, OpenNebula, OpenStack, Data-center interconnection, Network

topology modeling, virtual switches, VLAN, 802.1Q, Inter-cloud, NRS, Netwerk Regel

Systeem, Network Resource Scheduler

Preferred

reference

Dimitrios Theodorou, Network Provisioning in IaaS Clouds: A Network Resource

Management System. Eindhoven University of Technology, SAI Technical Report,

September, 2012.

A catalogue record is available from the Eindhoven University of Technology Library.

ISBN: 978-90-444-1169-0 (Eindverslagen Stan Ackermans Instituut ; 2012/067)

Partnership This project was supported by Eindhoven University of Technology and Nikhef.

Disclaimer

Endorsement

Reference herein to any specific commercial products, process, or service by trade name,

trademark, manufacturer, or otherwise, does not necessarily constitute or imply its

endorsement, recommendation, or favoring by the Eindhoven University of Technology or

Nikhef. The views and opinions of authors expressed herein do not necessarily state or

reflect those of the Eindhoven University of Technology or Nikhef, and shall not be used

for advertising or product endorsement purposes.

Disclaimer

Liability

While every effort will be made to ensure that the information contained within this report

is accurate and up to date, Eindhoven University of Technology makes no warranty,

representation or undertaking whether expressed or implied, nor does it assume any legal

liability, whether direct or indirect, or responsibility for the accuracy, completeness, or

usefulness of any information.

Trademarks Product and company names mentioned herein may be trademarks and/or service marks of

their respective owners. We use these names without any particular endorsement or with

the intent to infringe the copyright of the respective owners.

Copyright Copyright © 2012. Eindhoven University of Technology. All rights reserved.

No part of the material protected by this copyright notice may be reproduced, modified, or

redistributed in any form or by any means, electronic or mechanical, including

photocopying, recording, or by any information storage or retrieval system, without the

prior written permission of the Eindhoven University of Technology and Nikhef.

i

Foreword The title of this report, “Network Provisioning in IaaS clouds”, may seem a bit dry at

first, but it does cover the subject very nicely. When we started this project, we posed

a research question on automatically provisioning networks in a cloud environment.

Little did we know that this topic would become one of the "hottest" in cloudspace

over the course of the project. Thus, even though the title may appear dry, the

contents of this report are, as of date, far from dry.

At a grid&cloud conference held in September 2012, it became very clear that

network provisioning is a major topic for computer science research and of great

business interest.

The cloud environment is a continually shifting landscape, with many players.

Dimitris managed this shifting landscape very well, and developed a network

provisioning platform that is scalable, pluggable and cloud-platform independent.

This ensures that his efforts remain usable over the next years. During the project we

have had many fruitful discussions about the architecture of the system, as well as

brief periods of mild panic as another new cloud provisioning tool appeared on the

internet, threatening to make Dimitris' work obsolete. But as Dimitris progressed

further and further it became clear that the NRS has unique features that no other

virtual network provisioning tool currently can provide.

The result of Dimitris' research is a working prototype of the network provisioning

tool that we originally envisaged. The only question remaining is whether this report

is the end of a project or rather the start of a new, interesting, and very exciting

future.

Jan Just Keijser

September 25, 2012

iii

Preface As of the writing of this report, Network Virtualization is a domain undergoing

innovation and development. It is a domain concerned with the networking aspect

of cloud computing, whose popularity has skyrocketed in the previous years. The

Grid computing community is investigating cloud computing services, adding

their own priorities in what they expect from network virtualization.

This report describes the design and implementation of a solution for network

provisioning in cloud infrastructures, which tries to combine open-source cloud

platforms with network topology management and hardware configuration. The

project was the final project for the Stan Ackerman’s Institute Software

Technology program, Eindhoven University of Technology, and was performed

on behalf of Nikhef, a Grid infrastructure provider.

This report is of technical nature and targets an audience familiar with object-

oriented software and networking. For an elaboration of the needs the system

fulfills, readers should refer to Chapters 2-3. The design and deployment of the

system is described in Chapters 4-6. Lastly, for results and conclusions, the reader

should refer to Chapter 8.

Dimitrios Theodorou

September 29, 2012

v

Acknowledgements Several people assisted me in the realization of this project. First and foremost, I

would like to thank Jan Just Keijser and Tristan Suerink, my supervisors at

Nikhef. They provided me with technical advice, participated in brainstorming

sessions, gave me access to Nikhef hardware, and in general made sure I was

provided anything I needed for the completion of the project. Tristan Suerink in

particular was the ‘brain’ behind the system’s initial concept.

I would also like to thank Rudolf Mak, my TU/e supervisor, for his precise and

constructive contributions in the design, presentation and documentation of the

project.

There are plenty of other people who made contributions to the project. I would

like to thank David Groep for helping with the formation of the system’s concept

and initial design, and Jeff Templon for helping me to quickly become familiar

with the Grid ecosystem and providing a nice working atmosphere at Nikhef.

I would also like to thank Mischa Sallé and Oscar Koeroo for their technical

guidance and contributions in matters of system deployment and development

throughout the duration of the project. Paco Bernabé also helped with deploying

and testing the system.

I also want to thank the OOTI program facilitators, Ad Aerts and Maggy de Wert

for doing their best to assist with any administrative and other practical issues that

occurred during the project as well as during the whole 2-year OOTI period.

Lastly, I would like to thank Hurng-Chun Lee for providing relaxing

conversational moments during office hours.

Dimitrios Theodorou

September 29, 2012

vii

Executive Summary Currently, several open-source cloud platforms implement IaaS Cloud services.

However, they provide Virtual Networks to their users using rudimental network

configurations that have the following limitations:

VLAN management is either not possible or not easy

Static VLAN configurations are provided which do not scale

Network QoS is not supported

Data-center interconnections are not supported

This report describes a project to design a network provisioning system for cloud

infrastructures that overcomes the limitations mentioned above, called the

Network Resource Management System (NRS).

NRS is a Network Virtualization platform. It is a stand-alone system for

allocating network resources by controlling and configuring both physical and

virtual network devices using plugins. The system seamlessly integrates with

cloud platforms, such as OpenNebula and OpenStack. Its core functionality

consists of network topology management, network hardware device

configuration and ability to create inter-cloud connections by negotiating with a

remote system.

The system has the following strong points:

It provisions VLANs only in the parts of the network that are required.

This greatly increases scalability compared to static VLAN

configurations.

It is very extensible. The system configures network hardware with

plugins, and adding additional networking hardware is just the matter of

creating a proper plugin.

It can connect resources lying in different cloud sites

The system is not a complete solution on its own, but rather a foundation that can

be easily extended with features. This is achieved through the system’s

modularity; it supports extensions such as

lightpath inter-cloud connections

configuration of advanced switches

transition from 802.1Q to Q-in-Q, or other network isolation mechanism

algorithms that manipulate the topology and provide connections in

ways tailored to satisfy the needs of the cloud infrastructure architect

ix

Table of Contents

Foreword................................................................................................................ i Preface ................................................................................................................. iii Acknowledgements .............................................................................................. v Executive Summary ............................................................................................ vii Table of Contents................................................................................................. ix List of Figures ...................................................................................................... xi List of Tables .......................................................................................................13 1 Introduction ................................................................................................. 1

1.1 Purpose ...................................................................................................... 1 1.2 Context ...................................................................................................... 1 1.3 Stakeholders .............................................................................................. 3 1.4 Outline ...................................................................................................... 3

2 Domain Analysis ......................................................................................... 5 2.1 IaaS Clouds ............................................................................................... 5 2.2 Grid to cloud ............................................................................................10 2.3 Network Provisioning in IaaS Clouds ......................................................12 2.4 Open-source cloud platforms ...................................................................14 2.5 Issues with network provisioning in IaaS Clouds ....................................22 2.6 Data-center inter-connect technologies ....................................................24 2.7 Recent developments in cloud networking ..............................................25

3 System Requirements .................................................................................27 3.1 The Network Resource Management System ..........................................27 3.2 Use Cases .................................................................................................28 3.3 NRS Logic and Interfaces ........................................................................32 3.4 Requirements List ....................................................................................35

4 System architecture ....................................................................................37 4.1 Prototype Overview .................................................................................37 4.2 Service Interface ......................................................................................38 4.3 Topology ..................................................................................................41 4.4 Graph representation of the network topology .........................................45 4.5 Network Isolation .....................................................................................51 4.6 Algorithms for operations on the topology ..............................................52 4.7 Device Plugins .........................................................................................57 4.8 Request Manager .....................................................................................60 4.9 External networks and inter-cloud ...........................................................64 4.10 Administrator Access .............................................................................70 4.11 Conclusion .............................................................................................75

5 Implementation ..........................................................................................76 5.1 Python ......................................................................................................76 5.2 System components..................................................................................76 5.3 Code analysis ...........................................................................................78 5.4 Concurrency .............................................................................................80

6 Deployment ................................................................................................83 6.1 NRS cloud deployment ............................................................................83 6.2 Device plugins .........................................................................................84 6.3 Inter-cloud with OpenVPN ......................................................................85 6.4 Deployment at the Nikhef cluster ............................................................87

7 Verification and Validation ........................................................................90 7.1 Functional Validation ...............................................................................90 7.2 Non-functional validation ........................................................................96 7.3 Verification ..............................................................................................98

8 Conclusions ................................................................................................99 8.1 Functional Results ....................................................................................99 8.2 Design criteria ........................................................................................101 8.3 Future development................................................................................102

9 Project Management ................................................................................104

x

9.1 Management Process ............................................................................. 104 9.2 Project Milestones ................................................................................. 104 9.3 Risk Management .................................................................................. 107

Bibliography ..................................................................................................... 109 Appendix A. Glossary................................................................................. 111 Appendix B. NRS and VM scheduling ....................................................... 116 Appendix C. NRS CLI and configuration .................................................. 120 About the Author .............................................................................................. 123

xi

List of Figures

Figure 1: Visualization of Higgs candidate event .................................................. 1 Figure 2: LHC experiment dataflow ...................................................................... 2 Figure 3: Virtual Machine Host ............................................................................. 6 Figure 4: Stacked Avaya switches ......................................................................... 7 Figure 5: Virtual and physical switches ................................................................. 8 Figure 6: The Grid service ....................................................................................11 Figure 7: Virtual Network cloud resource ............................................................12 Figure 8: OpenNebula components ......................................................................15 Figure 9: OpenNebula VNM actions ....................................................................17 Figure 10: VM connectivity with Linux bridge ....................................................18 Figure 11: VM connectivity with Open vSwitch ..................................................18 Figure 12: OpenStack Nova architecture ..............................................................20 Figure 13: OpenStack Nova networking ..............................................................21 Figure 14: OpenStack Quantum functionality ......................................................22 Figure 15: OpenStack Quantum Virtual Network ................................................22 Figure 16: Physical and logical connectivity of VMs ...........................................28 Figure 17: NRS basic request usage scenario .......................................................29 Figure 18: NRS inter-cloud scenario ....................................................................31 Figure 19: NRS in a cloud context .......................................................................32 Figure 20: NRS layered architecture ....................................................................38 Figure 21: NRS Service interface operations........................................................39 Figure 22: NML object model ..............................................................................42 Figure 23: Basic network components ..................................................................43 Figure 24: Network Node Builders .......................................................................45 Figure 25: Switch graph model .............................................................................46 Figure 26: VM host graph model ..........................................................................46 Figure 27: Topology graph ...................................................................................47 Figure 28: Object diagram of the topology ...........................................................48 Figure 29: L2Network graph ................................................................................49 Figure 30: Graphs in the class diagram ................................................................50 Figure 31: Observer pattern ..................................................................................50 Figure 32: Network Isolation with VLAN ids ......................................................52 Figure 33: Growing algorithm operation ..............................................................54 Figure 34: Algorithm modules ..............................................................................56 Figure 35: Network consistency ...........................................................................57 Figure 36: Device Plugins class diagram ..............................................................59 Figure 37: Device Plugin instantiation .................................................................59 Figure 38: Reservation sequence diagram ............................................................61 Figure 39: Request manager class diagram ..........................................................62 Figure 40: Allocation sequence diagram ..............................................................63 Figure 41: Release sequence diagram ...................................................................64 Figure 42: Connect sequence diagram ..................................................................64 Figure 43: Connecting to a gateway sequence diagram ........................................65 Figure 44: Inter-cloud negotiation class diagram .................................................67 Figure 45: Inter-cloud sequence diagram .............................................................69 Figure 46: Operations to modify the topology ......................................................72 Figure 47: Administrator CLI over Telnet class diagram .....................................74 Figure 48: NRS system state.................................................................................75 Figure 49: NRS components .................................................................................77 Figure 50: NRS OpenNebula network driver .......................................................78 Figure 51: Overview of source packages ..............................................................78 Figure 52: Sharing of request manager .................................................................81 Figure 53: NRS initialization ................................................................................82 Figure 54: NRS deployment in a cloud site ..........................................................84 Figure 55: NRS basic configuration .....................................................................84 Figure 56: NRS Open vSwitch plugin operation ..................................................85

xii

Figure 57: NRS inter-cloud with OpenVPN ........................................................ 86 Figure 58: NRS inter-cloud configuration options ............................................... 87 Figure 59: Private cloud deployment at Nikhef ................................................... 88 Figure 60:Topology snapshot of the deployed NRS system ................................ 89 Figure 61: NRS in the Grid ecosystem............................................................... 101 Figure 62: Initial project timeline ...................................................................... 106 Figure 63: Milestone trend analysis ................................................................... 107 Figure 64: Actual project timeline ..................................................................... 107 Figure 65: Three VM hosts connected to each other .......................................... 117 Figure 66: Network capacity inside a VM host .................................................. 117

xiii

List of Tables

Table 1: Stakeholders ................................................................................................... 3 Table 2: OCCI Cloud resources ................................................................................... 9 Table 3: OCCI resource links ..................................................................................... 10 Table 4: OpenNebula cloud resources ........................................................................ 16 Table 5: OpenNebula cloud aggregation features ...................................................... 19 Table 6: NRS target users ........................................................................................... 29 Table 7: Requirements List ........................................................................................ 35 Table 8: Starting an inter-cloud negotiation ............................................................... 68 Table 9: Cancelling an inter-cloud negotiation .......................................................... 68 Table 10: Source code analysis .................................................................................. 79 Table 11: Cyclomatic complexity .............................................................................. 79 Table 12: Network connectivity tests ......................................................................... 90 Table 13: Network isolation test................................................................................. 91 Table 14: VLAN device restrictions test .................................................................... 92 Table 15: Administrator static topology modification test ......................................... 93 Table 16: Administrator L2 network modification test .............................................. 93 Table 17: VLAN administrator restrictions test ......................................................... 94 Table 18: OpenVPN inter-cloud test .......................................................................... 95 Table 19: NRS project milestones ............................................................................ 105 Table 20: Risk management ..................................................................................... 108 Table 21: IaaS Cloud terminology ........................................................................... 111 Table 22: Virtualization terminology ....................................................................... 112 Table 23: Networking terminology .......................................................................... 113 Table 24: Links between VMs compared to 1Gb requested capacity ...................... 118 Table 25: Rank based on highest value among pairs ................................................ 118 Table 26: Rank based on lowest value among pairs ................................................. 118 Table 27: Rank based on choosing one among the pairs that works ........................ 118

Introduction 1

1 Introduction This chapter introduces the project for the Network Resource Management System. It

presents the background for the project, the project’s stakeholders, and an outline for

the rest of the document.

1.1 Purpose

This document describes the project for the creation of the Network Resource

Management System (NRS), which is software for network provisioning in cloud

infrastructures. It first discusses the needs that the system covers and its envisioned

functionality. It then describes the design decisions that led to the software prototype,

both on an architectural and on an implementation level. The process that led to the

project’s outcome is documented as well.

The project for the NRS system was carried out as the author’s final project in the

Software Technology Program of the Technical University of Eindhoven.

1.2 Context

The project is done on behalf of Nikhef1, the Dutch national institute for subatomic

physics. Nikhef is a contributor and collaborator in the Large Hadron Collider (LHC).

LHC is the world’s largest particle accelerator located at CERN, Geneva, and has

started its operation on September 2008. Several experiments are being conducted at

LHC that aim to provide answers to fundamental open questions in Physics. Such

experiments typically consist of the acceleration and collision of particles; special

detectors pick up the collision results as raw data, which are analyzed to provide

meaningful results. The results, collected over long periods of time, are interpreted by

physicists to gain insight on the nature of the collision phenomena. The visualization

of a collision event that occurred in the experiment for the discovery of the Higgs

particle is shown in Figure 1.

Nikhef has contributed in constructing LHC detectors and participates in LHC

experiments. LHC experiments produce huge amounts of data (in the order of 20

petabytes per year) that need to be stored and analyzed. Figure 2 shows the dataflow

1 Dutch National Institute for Subatomic Physics, http://www.nikhef.nl 2 Image source: http://www.atlas.ch/photos/events-collision-proton.html

Figure 1: Visualization of Higgs candidate event2

Introduction 2

of an LHC experiment. The storage and analysis of such volumes of data is achieved

with the use of the Grid computing platform.

Grid computing refers to large-scale distributed data processing across different data-

centers. The computing platform that implements and makes it accessible is called the

grid platform. Scientific grid providers are organizations, usually research institutes,

which provide a grid platform for scientific research of various backgrounds. The

biggest grid platform is the Worldwide LHC Computing Grid (WLCG)3, centered

around CERN. The WLCG participants, which are the organizations that provide

infrastructure to the platform, are distributed across various countries. Grid

computing basically evolved during the last 10 years around the WLCG. Nikhef is a

scientific grid provider and a WLCG participant.

Lately, there has been interest among grid providers in exploring whether and how to

incorporate virtualization and cloud computing elements in the grid platform. Cloud

computing refers to the delivery of computing resources as a service; specifically, the

resources are virtual machines, storage, and network services between them. These

resources are provisioned from the cloud site, which is the cluster where the cloud

computing platform is installed. Grid providers are looking into re-using existing

open cloud computing platforms and integrating them into their workflow. From the

available implementations, one of the areas that has not been adequately developed is

the delivery of network resources. Specifically, the managing of network resources

within a cloud site is only rudimentary, while the ability to connect cloud resources

located in different sites is absent. This is of particular importance to grid providers

and Nikhef, and this is where the project for the NRS system comes in. The NRS

system attempts to provide a basic solution that aids cloud platforms at managing

network resources, which includes dynamic resource provisioning and providing

network services across different data-centers. Cloud computing is elaborated in

Chapter 2.

3 CERN - Worldwide LHC Computing Grid,

http://public.web.cern.ch/public/en/lhc/Computing-en.html 4 Image source: Overview of the Computing Fabric at CERN [28]

Figure 2: LHC experiment dataflow 4

Introduction 3

1.3 Stakeholders

Nikhef is the main stakeholder for the project. Nikhef’s Physics Data Processing

Group (PDP) is active in grid research, deployment and software development. This

project was conducted as an activity of the PDP group. Nikhef is also a participant in

the BiG Grid project, a Dutch national initiative to improve grid infrastructure and

assist scientists of all backgrounds to use it. This project resides under the umbrella of

BiG Grid activities. All the stakeholders are shown in Table 1.

Table 1: Stakeholders

Name Role Description

Nikhef Customer,

Grid infrastructure

provider,

BiG Grid participant

Main stakeholder.

The project is done for Nikhef. Its

interest is to investigate how to

incorporate cloud services to its

infrastructure.

SARA Grid/Cloud

infrastructure provider,

BiG Grid participant

Secondary Stakeholder.

Has expressed interest in the outcome of

a network provisioning project and the

prospect to evaluate it for its cloud

infrastructure.

BiG Grid Dutch e-science

infrastructure project

Nikhef and Sara are collaborators for the

BiG Grid project, whose interests are

expressed through them.

Technical

University of

Eindhoven (TU/e)

Employer The project is the final project of the

two-year Software Technology post-

M.Sc. program of TU/e.

Although not a stakeholder per-se, there has been communication with developers of

two popular open-source cloud platforms, namely OpenNebula5 and OpenStack

6.

Cloud networking is a developing area, and it was made sure that the project was in

line with how these cloud platforms operate, without providing overlapping

functionality or features that already exist or are under development in the platforms.

In addition, the developers have expressed an interest in seeing the outcome of a

network provisioning project, especially if it is provided in a form that inter-operates

with their software stack.

1.4 Outline

The rest of the document’s chapters are outlined as follows.

Chapter 2: Domain Analysis provides information on the technology domains that are

relevant to the project, namely Infrastracture-as-a-service (IaaS) clouds and network

provisioning in IaaS clouds. It also presents limitations with current implementations,

which led to the creation of this project and determined its requirements.

Chapter 3: System Requirements describes how the high level needs of the

stakeholders are translated to capabilities of a software system. To show how these

are fulfilled, it gives a high level description of the system and its key functionality

with the aid of use-cases.

5 http://opennebula.org 6 http://www.openstack.org

Introduction 4

Chapter 4: System architecture discusses the prototype of the system, its architecture,

the elements that compose it, and its business logic. It explains various design

decisions and choices based on the system requirements.

Chapter 5: Implementation presents information on the implementation of the system

prototype, which includes the outline of the system’s components and code analysis

results. It also describes the different communicating tasks of the system.

Chapter 6: Deployment describes the possible deployment options of the system

within a cloud site, how it interacts with the site’s components, and configuration

options. It also presents the deployed infrastructure for the needs of the project.

Chapter 7: Verification and Validation presents the testing procedures that determine

to what extent that the prototype’s functionality and behavior fulfills the system

requirements.

Chapter 8: Conclusions summarizes the outcome of the project, compares it against

the initial project goals, and tries to place it among global developments in cloud

networking. It also discusses recommendations for further development of the

system.

Chapter 9: Project Management presents the progress of the project throughout its

duration and the management processes used to monitor and control it.

This project lies in the technology domain of networking and Infrastructure-as-a-

Service (IaaS) cloud computing, which encompass a variety of domain specific

concepts and terms. Appendix A provides a glossary with definitions for relevant

terms used throughout the document.

Domain Analysis 5

2 Domain Analysis The Network Resource Management System deals with network provisioning in

Infrastructure-as-a-Service clouds. This chapter describes the technology domains

related to the system, and an analysis of issues with the current state of available

technologies.

2.1 IaaS Clouds

Cloud computing in general refers to the delivery of computing resources as a

service, provided over a network (e.g., the Internet). The most basic model of cloud

computing is Infrastructure-as-a-Service (IaaS), where the resources offered are

computers, storage, and network services between them. Such a service is called an

IaaS cloud. IaaS clouds utilize hardware virtualization technology, which allows

creation of efficient virtual hardware platforms (virtual machines). The computers

offered by the IaaS service are virtual machines.

2.1.1 Virtualization

Hardware virtualization refers to the virtualization of a computer or an operating

system, i.e. the creation of an abstract platform (the virtual machine or VM) that acts

like a real computer with an operating system, hiding the physical characteristics of

the actual computing platform. Full virtualization enables the virtual machine to

simulate enough hardware, so as to be able to run unmodified “guest” operating

systems as they would run in a real computer. In addition, the guest operating system

is run in isolation from the “host” operating system, which is the operating system

running on the actual computing platform. The software that controls virtualization is

called the hypervisor and runs on physical computers, which are called the virtual

machine hosts (see a rough sketch of a VM host in Figure 3).

There are two types of hypervisors:

1) Bare-metal or native hypervisors, which run directly on the host’s hardware

to control the hardware and manage guest operating systems (KVM, Xen,

VMware ESX, Microsoft Hyper-V)

2) Hosted hypervisors, which run on top of an existing operating system

(VirtualBox, VMware Workstation)

Hardware-assisted virtualization was introduced to the x86 processors in 2005-06

(Intel VT-x and AMD-V). This refers to instruction set extensions that aim to assist

with virtualization, and enable efficient full virtualization with the help of hardware.

Bare-metal hypervisors running on hosts that support hardware virtualization exhibit

the best performance, and are the ones chosen for cloud infrastructures.

Domain Analysis 6

2.1.2 Network switches

An IaaS Cloud deploys multiple virtual machine hosts, which are hosted in some

data-center. To offer network services between virtual machines, the hosts need to be

connected to each other. They also need to be connected to a central management

point in order to be controlled and monitored.

Data-centers typically deploy network switches to connect hosts. Network switches

are network devices that connect different devices or network segments. They have

multiple ports (50-port switches are common) and forward network traffic in the link

layer of the TCP/IP stack. A switch can have ports that support different link layer

technologies (Ethernet, FibreChannel). Switches offer more sophisticated

functionality than simple network hubs. Hubs do not manage the traffic that enters

their ports, but repeat traffic coming into one of their ports to all their other ports.

Traffic is transferred over a single medium, which creates packet collisions and thus

lowers performance. Switches on the other hand do not repeat traffic to all ports, but

maintain tables that associate destination addresses with switch ports, and are able to

inspect incoming packets’ address destination. This allows to avoid blindly repeating

traffic and to forward the packets only to the switch ports towards the packet’s

destination. Switches are also implemented in a manner that provides a separate

medium for all traffic that occurs among different ports. They also provide full

duplex links, which means that there are no collisions for traffic going to opposite

directions between the same pair of ports. Thus, packet collisions are non-existent.

Overall, network switches provide the highest performance in interconnecting

different devices.

Switches forward traffic using specialized hardware, which is called the switch

backplane. The forwarding rate of a switch is determined by the implementation of

7 Image source: http://en.wikipedia.org/wiki/File:Hardware_Virtualization_(copy).svg

Figure 3: Virtual Machine Host7

Domain Analysis 7

the switch backplane. Some switch models can be “stacked”, i.e. connected to each

other via a special configuration which effectively makes them behave as one unit

with increased backplane bandwidth. An image with multiple stacked switches is

shown in Figure 4.

Apart from forwarding traffic, different switch models can have different and

advanced functionalities. Switches typically support various protocols and/or

methods for management and configuration (e.g., web interface, CLI, SNMP [1]),

monitoring (e.g., Netflow, sFlow), as well as advanced configuration options that

provide Quality of Service (QoS) guarantees.

QoS in computer networks refers to the transport of traffic with special requirements.

It aims to alleviate two network issues: network bandwidth and time sensitivity. Bit-

rate and latency are two examples that are common QoS guarantees. QoS is primarily

achieved with the ability to identify and discriminate among data traffic from

different applications that have different QoS requirements. Various policies can be

defined and applied on the different types of traffic (traffic classes). Applying a

policy means that the traffic may be shaped (refers to bit-rate limiting) or given

priority to process and forward. In general, different switch models’ and vendors’

support for QoS exhibits a lot of variation.

2.1.3 Virtual switches

Virtual Machines get network connectivity through their virtual network interfaces

(VIFs), which are seen as regular network interfaces from the guest OS as well as the

host OS. The VM can see only its own VIFs, while the host can see all VIFs that

belong to VMs that reside on the host. In order for VMs to get access to any external

network, their VIF(s) need to be connected to the VM host’s physical network

interface (NIC). This is achieved with the use of virtual switches.

Virtual (or software) switches are the equivalent of hardware network switches, but

they operate on the interfaces seen within the VM host, which include the host’s

NIC(s) and any existing VIFs. Typically the host has one NIC assigned to provide

connectivity to the “production” network. VIFs are connected to that NIC, which is

often referred to as the VM gateway. A virtual switch is depicted conceptually in

Figure 5, where eth0 is the VM host’s interface acting as the VM gateway and vnet0-

2 are the VIFs.

8 Image source: http://en.wikipedia.org/wiki/File:Avaya_ERS-2500_Stack.jpg

Figure 4: Stacked Avaya switches8

Domain Analysis 8

The simplest implementation of a virtual switch is the Linux bridge [2]. It is a

software bridge that can connect two or more network interfaces (and thus bridge two

or more Ethernet segments). Its functionality is simple; it copies all incoming

Ethernet frames to all interfaces that it is connected to. It is available in all Linux

distributions.

Apart from the Linux bridge, other virtual switches exist that may provide

functionality equivalent to hardware switches, including support for management and

monitoring protocols, QoS, etc. Well known virtual switches are Cisco Nexus

1000V9, VMware vDS

10 and the open-source Open vSwitch [3]. Section 2.4 shows

typical usage of the Linux bridge and Open vSwitch as deployed and used by cloud

platforms.

2.1.4 Cloud deployment models

The infrastructure of an IaaS Cloud consists of virtual machine hosts, switches that

connect the hosts, optionally specialized storage, and some computers that run the

cloud software platform. The cloud platform is the software that implements the IaaS

service and orchestrates the entire operation: It controls and provisions the resources

of the cloud infrastructure, it provides one or more interfaces and APIs through which

it services requests from the cloud users, and it provides tools for monitoring and

administration. Typically, the cloud platforms have different classes of users that

determine which resources the user is allowed to request. There are various open-

source cloud platform implementations; see section 2.4 for more.

IaaS Clouds come in different deployment models. Although the cloud users do not

experience different behavior, the different deployment models matter to the cloud

infrastructure maintainer.

Public clouds: These are cloud services offered to the general public by some

provider, typically with a pay-per-use model. Well known public clouds are Amazon

Elastic Compute Cloud (EC2)11

and Rackspace public cloud12

. The internal

implementation of these cloud platforms is closed-source. These services have

become very popular for web application deployment due to the speed, flexibility and

potential for scalability they provide. The popularity of cloud skyrocketed through

such public cloud services.

Private clouds: Private clouds refer to cloud services built and operated for exclusive

9 http://www.cisco.com/en/US/products/ps9902/index.html 10 http://www.vmware.com/products/vnetwork-distributed-switch/overview.html 11 http://aws.amazon.com/ec2/ 12 http://www.rackspace.com/cloud/public/

Figure 5: Virtual (or software) and physical (or hardware) switches

Domain Analysis 9

use by a company or organization to support its business operation. The cloud

infrastructure may be hosted internally or by a third-party. The big difference to

public clouds is that the organization using a private cloud has to build and manage

the cloud infrastructure, but also has full control over how it works.

Community clouds: Similar to private clouds, the difference of community clouds is

that they are shared between various organizations that have a common goal.

Deployment and maintenance is shared as well, incurring less costs than if each

organization would have to maintain its own private cloud.

Hybrid clouds: A hybrid cloud aggregates private with public clouds, offering

resources that may originate from either cloud. Each cloud still is a separate domain

of deployment and administration. A hybrid cloud needs certain mechanisms that can

seamlessly integrate the different types of clouds it aggregates.

2.1.5 Cloud resources

In general, different cloud platform implementations name the resources they provide

to their users differently, and there is no standardization of APIs or interfaces.

Virtually every cloud platform offers its own platform-specific API. An API that has

become a de-facto standard is the Amazon EC2 query [4], due to the popularity of the

Amazon service. Despite API and interface differences, in the end the resources

offered are the same: compute, storage, and network.

There has been an attempt to create a standard specification for requesting IaaS cloud

resources, called the Open Cloud Computing Interface (OCCI) [5]. OCCI has not

been widely adopted by cloud platforms as of the writing of this report, but it is

essentially the sole effort to unify cloud resource interfaces. OCCI specifies three

types of resources, and two types of links that connect resources to each other. The

OCCI resources are shown in Table 2, together with the Amazon Web Services

(AWS13

) resource equivalent to indicate how naming can change between different

cloud services. Amazon was chosen because it is the most popular cloud service.

Table 2: OCCI Cloud resources

Resource name AWS equivalent Details

Compute EC2 Instance Information processing resource. Typically,

a virtual machine. Has CPU speed, memory

and other attributes.

Storage EBS14

Volume or

S315

object

Information recording resource. Used for

block storage or object storage service. The

storage can be persistent, i.e. it will remain

after a VM has been deleted.

Network VPC16

subnet A link layer networking entity, or a virtual

switch. Represents a single broadcast

domain for resources attached to it. Has a

802.1Q VLAN tag, and a name attribute.

The Network and Storage cloud resources need to be attached to a Compute resource

to become usable. The connections are modeled by Links in OCCI, as shown in Table

3.

13 Amazon Web Services, umbrella for all Amazon cloud services (http://aws.amazon.com/) 14 Elastic Block Storage, Amazon cloud block storage (http://aws.amazon.com/ebs/) 15 Simple Storage Service, Amazon specialized storage service (http://aws.amazon.com/s3/) 16 Virtual Private Cloud, private networks with AWS resources (http://aws.amazon.com/vpc/)

Domain Analysis 10

Table 3: OCCI resource links

Link name Details

NetworkInterface Promptly named, the NetworkInterface connects a

Compute resource to a Network. Has a MAC

address and optionally an IP address as attribute.

StorageLink Mounts a Storage resource to a Compute resource.

An attribute is the mount point in the guest OS.

The three OCCI resources aim to fully describe what is provided by cloud platforms,

and they seem to be sufficient for the cloud platforms analyzed for the needs of this

project (section 2.4). The resource of interest is the Network resource, which is

further explained in section 2.3.1.

2.2 Grid to cloud

The technologies referred to as grid computing have been developed a decade prior to

cloud computing. Both computing models fulfill approximately the same goal, i.e.

deliver computing as a service. The key missing ingredient in grid implementations is

virtualization, which brings benefits such as improved reliability and usability

(discussed in section 2.2.2).

2.2.1 The Grid

A grid computing platform enables aggregation of heterogeneous and geographically

dispersed computing resources and provides transparent access to them. It is a form

of distributed computing composed of loosely coupled resources across different

administrative domains. Its operation is facilitated by grid middleware, software that

is responsible for managing and distributing workload across computing resources.

Most production grids focus on assisting scientific applications. The Grid refers to the

Worldwide LHC Computing Grid, which is the largest grid platform, is distributed

internationally, and is focused on storing and analyzing data produced by LHC

experiments.

Domain Analysis 11

The Grid computing service can be described as a Platform-as-a-Service computing

model. Such a service typically provides its users with a platform which they can use

to execute software. Using the grid platform, scientists are able to access and analyze

data, in the form of submitting data analysis “jobs.” A scientist can access the grid

platform from anywhere; both the storage location of the data he or she requests and

the decision where to execute the jobs are determined by the grid platform (see Figure

6 for the overview). After the jobs are distributed to the selected destinations, they are

executed using technologies specific to the local deployment (job schedulers etc.).

2.2.2 Transition to cloud

There has been interest among grid providers in exploring a transition from the grid

service model to a model that involves IaaS clouds (or the very least hardware

virtualization) and the implications of such a transition. In the current grid service,

jobs are scheduled for execution on physical machines using various job schedulers.

With the introduction of cloud technologies, a “job” would be run inside virtual

machines. There are several perceived benefits for such a move: The virtualization

layer that is added provides an exclusive environment (the virtual machine) for a job

to be executed in. This means that the environment can be configured extensively: a

job may require specific privileges (e.g., root access) or a specific OS to run in (e.g.,

Windows). These are not possible to achieve with the current grid infrastructure, and

this is considered a portability barrier for using the grid as utility computing.

Apart from configurability, using virtualization creates isolated environments where

the user is confined, which improves security and reliability; in case something goes

wrong inside the VM, the host is unaffected. In addition, VMs can be easily moved to

different hosts while running without disrupting their operation (an action called VM

live migration). Upgrading the hardware and software of VM hosts can be performed

without worrying about potential incompatibilities for the VM user (since guest OSes

are unaffected) or disrupting the operation of running VMs (since they can be

migrated).

Moving to a cloud service brings a drawback: An administrator no longer has control

17 Image source: Overview of the Computing Fabric at CERN [28]

Figure 6: The Grid service17

Domain Analysis 12

over what a user is running in his or her VM, and more importantly over what

network traffic the user can generate. This poses a concern when it comes to

performance and security, since the machine running the VM is possibly connected to

dozens of other machines, and uncontrolled network traffic can disrupt or damage the

operation of other users’ VMs. Therefore, there is a need to be able to restrict and

fully control the network provided to VMs, and to be able to map network traffic to

specific users for administrative and control purposes.

2.3 Network Provisioning in IaaS Clouds

Part of the IaaS cloud service is to offer network services between the requested

virtual machines. The basic network service is connectivity in local-area networks,

but can also include wide-area connectivity, both to the Internet or to virtual private

networks (VPN). Virtual machines are hosted in computers that run other users'

virtual machines, and these computers are connected to several other computers

running virtual machines as well. It is mandatory that the networks offered between

virtual machines are isolated from the rest on a per-user basis. That is, a private and

isolated Ethernet segment is required for each different cloud user that requests a

private network. This network is modeled and presented to the user as the virtual

network cloud resource (the OCCI Network resource). The term virtual network will

be used to refer to this resource.

2.3.1 The Virtual Network cloud resource

Virtual networks are resources available to cloud users. From the user perspective,

they represent a private LAN, which is equivalent to a network switch, as seen in

Figure 7. The user is able to connect his or her VMs to a virtual network, an action

that is often referred to as attach or plug a VM to the network. The expected behavior

is that all attached VMs have connectivity to each other, and that the network is

private. The implementation details that achieve this behavior are not relevant for the

user.

2.3.2 Virtual Network implementation

To isolate the network traffic inside a Virtual Network, certain configuration is

required at some points along the network path that connects the different virtual

machines that belong to the network. The path includes both the VM hosts and

hardware switches: the VM gateway on the hosts (the virtual switch that connects the

VM to its host) and the ports of the switches that connect the hosts' physical

interfaces. The straightforward way to segment Ethernet uses 802.1q VLANs [6],

which is a networking standard that supports virtual local area networks.

Figure 7: Virtual Network cloud resource

Domain Analysis 13

2.3.3 VLANs

In general, a Virtual Local Area Network (VLAN) refers to the partitioning of a

physical network to create logically separated broadcast domains. IEEE 802.1Q is the

networking standard that supports VLANs on Ethernet networks, with up to 4094

different VLANs. Each VLAN has a unique identifier (from 1 to 4094).

The VLAN partitioning is performed at the link network layer (L2) by switch

devices. Each switch port can be assigned to be a member of a specific VLAN. The

switch will forward traffic only among ports that are members of the same VLAN,

effectively separating the traffic at the link layer level. These ports are called access

ports to the specific VLAN.

In case more than one VLAN needs to be transferred over a single port, the concept

of a trunk port is used. A trunk port can carry multiple VLANs, and differentiates

among them with the aid of VLAN headers. 802.1Q specifies an extra field in the

Ethernet frame that includes the VLAN identifier; that field is commonly called the

VLAN tag. A trunk port may have multiple tagged VLAN traffic and one untagged

VLAN.

802.1Q VLANs are supported universally by all switch vendors, although some

switch models support less than 4094 VLANs. This limited number of the protocol

creates a hard limit in scalability. Going above 4094 VLANs (which can be

equivalent to 4094 different cloud users, if each user has a virtual network) requires

the adoption of a different technology.

802.1Q Alternatives

Various 802.1Q alternatives are emerging that may solve the VLAN scalability issue.

A similar in nature to 802.1Q is 802.1Qad, also known as Q-in-Q [7], which works

by stacking VLANs. It allows for more than one VLAN header to be inserted in the

Ethernet frame, effectively creating nested VLANs. This allows for many more than

4094 VLAN combinations.

Multiprotocol Label Switching (MPLS) [8] is a different mechanism that can be used

to create VLANs. MPLS is a traffic routing protocol that uses special short labels that

direct data from one network node to the other. It can be used to carry Ethernet

frames or IP packets, thus replacing the traditional mechanism of the IP tables

determining how a packet will be forwarded. The MPLS labels form a virtual path

between distant nodes, thus creating the equivalent of a VLAN.

Virtual Extensible LAN (VXLAN) [9] [10] is a method that tunnels L2 frames over

IP to create VLANs. It uses a 24-bit network identifier, the VXLAN id, which allows

for many more than 4094 VLANs. VXLAN requires special VXLAN gateways that

encapsulate L2 frames and forward them to their destination using the existing

common network infrastructure. VXLAN also requires IP multicast support in order

to work. VXLAN is a new protocol and is supported by various companies (VMware,

Cisco, Arista), but at the moment the only implemented VXLAN gateways are a few

Cisco virtual switches. As for performance, VXLAN requires slightly higher

bandwidth that 802.1Q VLANs due to the larger size of its encapsulated packets, and

higher computing capacity to perform the encapsulation at the VXLAN gateways.

At this point in time, 802.1Q it is the only universally deployed and working method

that achieves the goal of network isolation, and a substitution of it does not seem to

be emerging in the immediate future. The alternatives mentioned above all require

hardware and software support, and none of them have been widely adopted or have

even started to become so. However, if large-scale cloud services are the future,

802.1Q VLAN technology will have to be replaced somehow.

2.3.4 Beyond L2 networks

Apart from local and Internet connectivity, more advanced network services are

considered for cloud services, for example firewall or load-balancer services. The

Domain Analysis 14

umbrella term Network-as-a-Service (NaaS) refers to all such services. As of the

writing of this report, this is an area of active development, but nothing concrete has

reached available cloud platforms. See section 2.7 for more on recent developments

in cloud network services.

2.4 Open-source cloud platforms

There are various cloud platforms that implement a cloud service in a computer

cluster. Typically, they require a few machines to run the cloud platform software,

and a number of machines to act as VM hosts. Popular open-source cloud platforms

are OpenStack [11], OpenNebula [12], and Eucalyptus [13].

2.4.1 OpenNebula

OpenNebula is a fairly mature (dating from 2008) open-source cloud software

platform that can deploy IaaS services. It was created and still developed by the DSA

(Distributed Systems Architecture) Research Group18

at Complutense University of

Madrid, with contributions by various EU projects and organizations19

. It has

spawned the C12G Labs20

private company that provides a commercial version of

OpenNebula, named OpenNebulaPro, and commercial support on cloud deployment.

OpenNebula is very modular, easily extensible, and amply documented21

. In addition,

it is used to power SARA’s HPC cloud22

, which was accessible for the needs of

evaluating OpenNebula. These are the main reasons it was chosen as the main cloud

platform to experiment and base the NRS system on.

Architecture

The OpenNebula system components and interfaces are shown in Figure 8. The

OpenNebula software is installed on one machine, the OpenNebula front-end, from

which it orchestrates the cloud operation. The front-end controls the set of VM hosts

via SSH communication; for that purpose, a special Unix user is required. The frond-

end processes are run under ownership of this user, and the VM hosts must allow this

user to connect to them passwordlessly over SSH and control their hypervisor.

18 http://dsa-research.org 19 http://opennebula.org/about:sponsors 20 www.c12g.com 21 http://opennebula.org/documentation:documentation 22 https://www.cloud.sara.nl/

Domain Analysis 15

The front-end software core consists of two C++ binaries:

1) the VM scheduler, which is responsible for choosing the hosts where the

VMs will be instantiated. VMs may have certain requirements (CPU and

memory) which are taken into account by the scheduler.

2) the main program, which accepts requests and manages VMs as they

proceed through their possible states.

An OpenNebula VM’s status is represented by various states. The basic ones are

Pending (right after being requested), Prolog (VM images being transferred to the

VM host), Boot (the hypervisor has just launched the VM), Running (guest OS ready

to use), and Done (VM no longer available). As a VM changes states, certain actions

need to be performed; for example copying VM images to the VM host, or triggering

certain hypervisor actions. This functionality is realized by OpenNebula’s drivers,

which are sets of scripts that perform the required actions. For example, the TM

(Transfer Manager) driver is responsible for moving and copying VM images and the

VM (Virtual Machine Manager) driver makes calls to the hypervisor to launch or

destroy a VM.

Most drivers are directly assigned to VM hosts, and different hosts may have

different drivers. That allows to seamlessly integrate hosts that use different

technologies, for example to have one host using the KVM hypervisor with the KVM

driver, and a different host using VMware with the VMware driver. New drivers are

very easy to create; all that needs to be done is creation of a new set of scripts that

comprise the driver. The drivers that OpenNebula comes out of the box with are

Ruby modules.

Resources

OpenNebula provides various resources to their users (and calls them Virtual

Resources), and every resource is owned by a specific user. Depending on

23 Image source: http://opennebula.org/documentation:rel3.6:introapis

Figure 8: OpenNebula components23

Domain Analysis 16

configuration and permissions, users are free to create new resources of their own, or

re-use existing ones. OpenNebula resources are represented by text files that use an

OpenNebula specific template. The resources are shown in Table 4 with their OCCI

equivalents.

Table 4: OpenNebula cloud resources

OpenNebula Virtual

Resource

OCCI equivalent Details

Virtual Machine

Template

Compute A VM template contains all the

information related to a VM, including

CPU architecture, memory size, and links

to all other resources the VM uses (Images

and Networks).

A VM template can be instantiated a

number of times to create Virtual Machine

Instances.

Virtual Machine

Instance

Compute A usable Virtual Machine that can be in

any valid VM state, and can receive

actions to trigger state transitions.

(shutdown, start, and others)24

Virtual Machine

Image

Storage A file that can be mounted to the VM. Can

be attached to the VM as a OS image from

where the VM will boot, a persistent data

block or a cd-rom.

Virtual Network Network A L2 network that VMs can be connected

to.

Interfaces

OpenNebula supports Amazon EC2, OCCI and an OpenNebula-specific XLM-RPC

interface to provision resources.

Networking

OpenNebula Virtual Networks can be created by a user or admin, and can be attached

to a VM (the attachment takes place in the VM template). Each attached network

corresponds to a new VIF in the VM. When a Virtual Network is first created, it

receives a unique identifier by OpenNebula which is used to identify it throughout its

existence. This network id and is not related to the VLAN id or any other such id. A

Virtual Network template25

may look as follows.

#OpenNebula virtual network template NAME = vlan6 BRIDGE = br0 TYPE = RANGED VLAN = YES VLAN_ID = 6 #PHYDEV = eth0 NETWORK_ADDRESS = 10.10.17.0/24 IP_START = 10.10.17.2

24 A state machine model describing the VM states is available at

http://opennebula.org/documentation:rel3.6:vm_guide_2 25 http://opennebula.org/documentation:documentation:vnet_template

Domain Analysis 17

There are a few interesting options; the BRIDGE attribute must be specified, and it is

the name of the bridge that the VM has to be connected in order to gain access to that

network. There is a one-to-one correspondence between a Virtual Network and a

bridge name, and all VM hosts must have a bridge of the same name in order to be

able to connect VMs to the specific network. The VLAN and VLAN_ID options are

related to the Virtual Network Manager driver (see next paragraph). The Virtual

Network also provides options to set the IP address range or fixed IP addresses of the

VMs in this network.

Virtual Networks are implemented by the OpenNebula Virtual Network Manager

(VNM) driver, similar in concept with the rest of the drivers. The VNM driver is a set

of scripts that are run in specific VM state transitions and are there to provide

network configuration. The transitions that trigger a VNM configuration action are

shown in Figure 9. Each host can have its own VNM driver. When VMs are launched

on the host, the VNM driver will be used to provide network configuration only if the

VLAN option in the VM’s networks is set to YES. The VNM driver may also use the

VLAN_ID option, depending on its implementation.

OpenNebula comes with two VNM drivers that implement 802.1Q VLANs, the

802.1Q VLAN driver and the OpenvSwitch driver.

The 802.1Q VLAN driver uses linux bridges and the PHYDEV option in the network

template. PHYDEV specifies the host NIC that acts as the VM gateway. For each

Virtual Network, the driver creates a VLAN device26

on top of the PHYDEV and a

bridge that connects the VLAN device to any VIFs that need to be connected to the

network. This functionality is shown in Figure 10, where three VMs are connected to

two networks, and each network is accessible through its bridge. The VLAN id is

either taken from the Virtual Network template, or chosen with a predefined method

in the drivers’ scripts. The driver can create bridges, but it does not remove them or

do any garbage collection whatsoever. It also requires all VM hosts to use a NIC with

the same name (e.g., eth0) as a VM gateway. The driver only performs configuration

on VM hosts; in order to work, it requires that the switches connecting the hosts

allow all possible VLANs to go through.

26 A VLAN device is a logical network interface that can be created on top of an existing NIC.

It receives tagged VLAN traffic and removes the tag, acting as an access port to the VLAN. It

is often named as <nic>.<vlan-id>. For example, eth0.15 will receive all tagged VLAN traffic

from VLAN 15 that reaches eth0 and will remove the tag.

Figure 9: OpenNebula VNM actions

Domain Analysis 18

The OpenvSwitch driver uses an Open vSwitch bridge, which is a much more

sophisticated version of the linux bridge. The name of the Open vSwitch bridge is

specified in the network template in the BRIDGE option. Interfaces can be attached

to the bridge as trunks or access ports, as in regular hardware switches, and no VLAN

devices are required. The Linux bridge scenario from Figure 10 is shown in Figure 11

with the Open vSwitch driver. Similar to the 802.1Q driver, the Open vSwitch driver

it requires all switches to allow all possible VLAN ids to go through, and the Open

vSwitch bridge has to have the same name in all hosts.

The OpenNebula mechanism that deals with network configuration, which is the

VNM drivers, has a few limitations. One is that the scope of driver actions is VM

specific, with no global state information being kept related to the status of network

resources (bridges and VLAN ids, for example). This is why bridges cannot be

cleaned up; there is no information on whether more VMs are using a bridge or not.

In addition, there is no actual managing of the 802.1Q VLAN ids, as in keeping a

state of used and unused VLAN ids, and being able to select and release VLAN ids

from and to that pool. At best, VLAN ids can be statically assigned to Virtual

Networks based on a calculation involving the network id, but there is no control over

the network id. Lastly, the VNM drivers require the switches to trunk all possible

VM-supporting VLANs.

Chapter 6 describes how the VNM mechanism was used to interface with

OpenNebula.

Figure 10: VM connectivity with Linux bridge

Figure 11: VM connectivity with Open vSwitch

Domain Analysis 19

Cloud Aggregation

OpenNebula supports methods for aggregating different cloud sites and presenting

their resources in a uniform manner to their users. They are listed in Table 5.

Table 5: OpenNebula cloud aggregation features

Feature Name Purpose Details

oZones Aggregate remote

OpenNebula cloud sites

Provides centralized management and

provisioning of Virtual Resources

located in different cloud sites.

Hybrid Cloud Aggregate an

OpenNebula cloud with

different cloud

technologies

Allows to add and use Amazon EC2

and deltacloud instances from an

OpenNebula cloud.

Public Cloud Expose an OpenNebula

cloud using common

public cloud interfaces

Allows to use OpenNebula resources

using the EC2 and/or OCCI interfaces

oZones is the most interesting feature, as it allows the oZones user to access and use

Virtual Resources that are available in different and remote OpenNebula installations

that may belong to different administrative domains. The oZones feature uses

abstractions called zones and Virtual Data Centers, which group together resources

belonging to the same OpenNebula installation. The user may have access to multiple

zones and Virtual Data Centers, which means that he or she has access to resources

that belong to different OpenNebula installations. The resources available through

oZones are the typical OpenNebula resources; this includes Virtual Networks, but a

user is not able to connect Virtual Machines belonging to different zones (i.e. to

remote OpenNebula installations) to a common Virtual Network (something that

would require an inter-cloud connection, which is a concern of the project’s

stakeholders).

2.4.2 OpenStack

OpenStack is a cloud platform that came into existence with the merging of NASA’s

Nebula project, an IaaS cloud platform, with Rackspace’s Cloud Files project, a cloud

storage service similar to Amazon S3. OpenStack is a rather new project, launched in

2010, and has been joined by more than 150 companies (including AMD, Intel, all

Linux commercial support and distributors, Cisco, Yahoo, and others). Compared to

OpenNebula, it is less stable and less mature, harder to extend, with more obscure

documentation, and with no cloud aggregation features. However, it is under rapid

development and promotion and with the amount of support it receives, it seems like

it will dominate at some point in the future.

OpenStack has two main components (and a few lesser). They are named Nova (or

Compute) and Swift (or Storage). The Nova component provides the IaaS service

similar to OpenNebula and is of interest for the project, while the Swift component is

not relevant.

Architecture

OpenStack Nova architecture is based on share-nothing message passing. Nova

consists of multiple components, called Controllers, that can be run on different

machines and are assigned with a specific task. There are Controllers for providing

Virtual Machines (Compute Controller), block storage (Volume Controller), and

networking between VMs (Network Controller). There is a Cloud Controller that

keeps the global state and interacts with all the other components. All components

communicate with each other using RabbitMQ, an implementation of the Advanced

Domain Analysis 20

Message Queue Protocol (AMQP27

). AMQP is a protocol for message-oriented

middleware, which implement the message broker architecture. OpenStack Nova

architecture is shown in Figure 12. Compared to OpenNebula, Nova’s architecture

can scale better due to the use of distributed Controllers.

Interfaces

OpenStack supports Amazon EC2 and an OpenStack-specific RESTful interface.

Networking

Instead of Virtual Networks, Nova has the concept of Projects, which are isolated

resource containers that have VM instances, a VLAN id, and an IP range. Although

the semantics seem to be different (a Project contains VMs, instead of a VM being

attached to a Virtual Network), Projects are equivalent to Virtual Networks.

Nova uses a “VLAN DHCP networking mode” to implement VLANs. For each

Project, a bridge and a VLAN device is created on every host that contains VM

instances of the project, in a similar fashion to the 802.1Q OpenNebula driver. Nova

provides some additional utilities compared to OpenNebula; it automatically creates a

VPN box instance (a VM running VPN) inside the Project, so that the user can

connect to his VMs through the VPN. It also supports providing Internet connectivity

and floating IPs to the Project’s VMs. These features are illustrated in Figure 13.

27 http://www.amqp.org 28 Image source: http://docs.openstack.org/developer/nova/nova.concepts.html

Figure 12: OpenStack Nova architecture28

Domain Analysis 21

Apart from providing Internet connectivity which is a welcome addition, Nova’s

networking when it comes to VLAN provisioning is even less diverse than

OpenNebula’s and has the same limitations. However, as a testament to the rapid

development of OpenStack, a new way of dealing with networking will have been

introduced to OpenStack by the release of this report: OpenStack Quantum29

will be a

new component available with the 27-9-12 release of the next version of OpenStack,

and will replace the previous networking modes.

OpenStack Quantum

Quantum is, in essence, an API specification that needs to be implemented by

aspiring network services that will implement Network Virtualization. When Virtual

Machines need to be connected to a network, Nova sends a request to the Quantum

Network Controller containing information on what needs to be connected, which is

a Virtual Network id and a set of VIFs (with the introduction of Quantum, the

terminology apparently changes from Project to Virtual Network). Quantum

translates the request to various API calls that manipulate the Virtual Network object

as needed (for example, attaching or detaching VIFs to it). These API calls have to be

implemented by a specific network service, called a Quantum plugin, that deals with

how the network provisioning is achieved. This provides a clean interface against

which third parties can write their network services to easily interoperate with the

OpenStack IaaS service. Quantum’s functionality is shown in Figure 14.

Quantum enchances the Virtual Network concept with the Virtual Port, which are

logical ports that belong to the Virtual Network object itself, and can be set

administratively UP or DOWN (enabled or disabled). The VIFs that need to be

connected to the Virtual Network are connected to the Virtual Network’s Virtual

Ports. The Quantum API calls revolve around manipulating these three objects. The

Virtual Network as envisioned by the Quantum team is shown in Figure 15.

29 http://wiki.openstack.org/Quantum

Figure 13: OpenStack Nova networking28

Domain Analysis 22

OpenStack Quantum comes with a few implemented Quantum plugins. Two of them

are a Linux bridge plugin, and an Open vSwitch plugin, which essentially have the

same functionality as the two respective OpenNebula VNM drivers.

2.4.3 Eucalyptus

Eucalyptus31

is the oldest open-source cloud platform, dating from 2008. It is

developed and sponsored by Eucalyptus Systems. Its architecture is similar to

OpenStack Nova, using distributed Controllers to manage Virtual Machines and

storage. Its networking support is similar to the previous two cloud platforms. It

supports 802.1Q VLANs using linux bridges (called “Host-managed mode”), and

carries the same limitations as the other two platforms. Eucalyptus supports Amazon

EC2.

Eucalyptus was not explored further since it neither exhibits any particular

differences from the rest of the cloud platforms nor is it more friendly to being

extended.

2.5 Issues with network provisioning in IaaS

Clouds

As seen in section 2.4, all major open-source cloud platforms implement network

provisioning with similar limitations when it comes to network configuration and

VLAN management.

The most important limitation is that cloud platforms do not have any control over

30 Image source: http://wiki.openstack.org/Quantum 31 http://www.eucalyptus.com/

Figure 14: OpenStack Quantum functionality30

Figure 15: OpenStack Quantum Virtual Network30

Domain Analysis 23

the network switches. They only control the VM host environment, which is the

virtual switches, the host NICs and the VIFs, and this is the only point where Ethernet

traffic is segregated (with 802.1Q VLANs). To make networking possible in this

manner, cloud platforms require that the entire switching infrastructure is configured

as “VLAN-clean,” which means that all switch ports should be configured as trunk

ports that allow all VLAN ids (or at least all VLAN ids that will be used to support

VM connections). This allows all generated traffic to go through the switches and

reach the VM gateways and virtual switches; every VLAN is one large broadcast

domain that contains all the switches and all the virtual switches. Network isolation

and partitioning is provided only at the VM gateway endpoint. This approach is far

from optimal, due to the following issues:

1. VLAN management

Network hardware (NICs, switches) have limitations on the VLAN numbers

they support. The limit can be static and/or dynamic (a limitation of

maximum number of different concurrent VLANs at runtime). This

knowledge is absent from existing implementations.

In addition, the network administrator may want to apply arbitrary

limitations to the VLANs available for use, perhaps because he wants to

partition his network using different VLANs.

Cloud platforms do not support dealing with such limitations, especially

dynamic ones. The network administrator would have to manually configure

complex VLAN setups. In addition, cloud platforms do not provide a

mechanism that can optimally manage available VLANs.

2. Performance and Scalability

Since every VLAN is a large broadcast domain containing all the switches

and VM hosts, broadcasts done within a VLAN will reach all these devices.

From a scalability perspective, the whole network operates as one large

broadcast domain. This creates issues such as MAC address exhaustion in

the switch MAC address tables, and performance problems due to the high

volume of broadcast traffic. (A broadcast is transmission of packets to all

reachable destinations). These are significant scalability problems. They can

be alleviated by administrators if they manually partition their network, but

that leads back to problem one.

3. Security

If a VM host is compromised, the attacker can gain access to other users’

private network traffic.

4. Ability to use network features (QoS, ACLs)

Network features such as Quality of Service and access lists cannot be

provided, since they would require configuration of the network hardware.

Although the situation is not optimal, private networks within a cluster are still

possible. The scalability potential however is kept to a minimum. What is also an

important lack is the absence of QoS support, as it prevents cloud solutions from a

whole range of network services.

The issues mentioned above refer to conditions within the same cluster. There is one

more issue that greatly increases the need for a network provisioning system, and it

lies elsewhere: There is currently no support for private networks that span different

cloud sites, a setup that would require an “inter-cloud” connection. Such a connection

would connect private networks located in different cloud sites in one network

segment. The management of such an inter-cloud connection is not trivial: There are

many different technologies that could implement it and they are managed in

different ways (see section 2.6).

These issues indicate the need for a network provisioning service that can provide

VLAN id management, QoS, and cross-site connections, or to phrase it more broadly,

that can manage network resources.

Domain Analysis 24

2.6 Data-center inter-connect technologies

As seen in section 2.4, being able to provide inter-connections between different

cloud sites is not supported by the explored cloud platforms. Such a connection is

referred to as an “inter-cloud” connection. If the cloud sites are located in different

data-centers, an inter-cloud connection requires traffic to be sent out of the local

computer cluster into an external network (possibly the Internet) and reach a remote

cluster in a safe manner. This can be achieved by various technologies.

2.6.1 Virtual Private Networks

Virtual Private Network (VPN) technology allows to extend local networks over an

intermediate network, e.g. to connect computers to a remote isolated network over the

Internet. The traffic that goes over the intermediate network is secured so that it stays

isolated within the intermediate network. A VPN allows to remotely access resources

that are located in an otherwise unreachable remote network, and is often deployed in

companies’ private networks so that a company’s employees can access the

company’s network remotely, e.g., from home.

There are three types of VPN implementations, based on the mechanism they use to

implement security:

SSL/TLS32

, which is a set of cryptographic protocols that provide encryption

of packets in the Application layer before they are being passed to the

Transport layer.

IPsec33

, which is a security extension of the IP protocol and provides

encryption at the Network layer.

PPTP34

, which encapsulates packets in a GRE tunnel and uses a control

channel over TCP. GRE35

is a tunneling protocol that can encapsulate

various network protocols over IP. PPTP is available mainly in the

Microsoft Windows PPTP stack, which includes various encryption

methods.

An open-source implementation is OpenVPN36

, which creates secure remote

connections with SSL/TLS. OpenVPN offers two types of networking using TUN

and TAP devices, which are virtual network interfaces. A TUN device is a network-

layer device with an IP address, and can be used for routing. When OpenVPN is used

with a TUN device, it creates an IP tunnel that can route between two different IP

subnets (L3 networking). A TAP device is a link-layer Ethernet device. When used

with OpenVPN, it allows to bridge remote LANs into a single broadcast domain (L2

networking). OpenVPN is one of the few SSL/TLS capable VPN products that

provide L2 networking capability.

2.6.2 Lightpaths

Lightpaths are optical end-to-end connections implemented via Wavelength-division

multiplexing fiber links. It is possible to transfer various link layer protocols through

them, for example Ethernet. They are independent from external networks (such as

the Internet) and thus provide security, reliability and guaranteed bandwidth capacity.

They also have high bandwidth capacity and minimal latency due to the nature of the

technology.

Lightpaths come in two types: fixed and dynamic. Fixed lightpaths are permanent

connections with a guaranteed bandwidth, and are assigned to a “permament”

customer. Dynamic can be set up dynamically by users, customizing bandwidth to

32 Transport Layer Security Protocol, http://tools.ietf.org/html/rfc5246 33 Internet Protocol Security, http://tools.ietf.org/html/rfc4301 34 Point-to-Point Tunneling Protocol, http://tools.ietf.org/html/rfc2637 35 Generic Routing Encapsulation, http://tools.ietf.org/html/rfc1701 36 http://openvpn.net/

Domain Analysis 25

their needs.

Lightpaths are not generally available in public networks, but they are available to

BiG Grid providers via SURFnet37

, a collaborating institute.

2.7 Recent developments in cloud networking

Cloud networking is a domain that is undergoing innovation and development. To

illustrate, this section shows some recent developments in networking, most of which

took place while this project was underway.

Software-Defined Networking (SDN) is a networking paradigm that has been

receiving attention while cloud services are becoming more widespread. SDN in

general refers to the control of the routing of network traffic by software. In

traditional networking, the traffic routing is determined by the routing tables

maintained by each router. Routing tables are populated by each router individually,

using protocols such as the Open Shortest Path First (OSPF) and Border Gateway

Protocol (BGP) protocols for proximal and distant networks respectively. This makes

traditional routing fully decentralized. The routing tables determine how the switch

forwarding tables are populated, which in turn determine how traffic is forwarded.

SDN, on the other hand, influences routing and/or forwarding with software. SDN is

a broad term that encompasses all protocols or software that attempts to influence

traffic forwarding.

The leading SDN architecture is OpenFlow38

. OpenFlow is a communications

protocol that can modify the switch forwarding tables using a remote software

controller; the switch forwarding tables are no longer populated based on the routing

tables. This is referred to as separating the control plane (the decision on how traffic

will be routed) from the forwarding plane (the act of forwarding). OpenFlow routes

traffic by defining network flows. A packet may be assigned to a flow using arbitrary

rules that can be based on packet inspection, include source/target MAC and IP

addresses, IP subnets and TCP ports. Each flow is assigned a specific action (e.g.,

forward packet to port 3, drop packet) that is performed on the packets of the flow.

This information is used to determine how to populate the switch’s forwarding tables

in order to achieve the desired behavior, which could be creating a specific

connection path between two endpoints (another way of creating VLANs). SDN with

OpenFlow allows for extreme flexibility and potential in defining network flows. In

order to worm, OpenFlow requires support from the switches. When it comes to

performance, the existence of the software controller means that the processing

performance of the controller must be exceptional in order to compare to the

traditional fully distributed routing.

Nicira is a new network virtualization start-up company that launched its Network

Virtualization Platform (NVP39

) in February 2012. NVP claims to be able to create

Virtual Networks on top of an existing network infrastructure. NVP shares similar

concepts with VXLAN (section 2.3.3), but uses Open vSwitch extensively. Ethernet

frames are tunneled over IP and transferred between the Open vSwitches, which are

essentially the gateways that perform the encapsulation and transmission of the

packets. The main (and practically sole) contributor to the Open vSwitch project is

Nicira. The main difference with VXLAN is that NVP also has a central software

controller that centrally manages and controls the Open vSwitches using OpenFlow.

This allows for flexibility in forming and provisioning Virtual Networks, and makes

NVP an SDN platform. Nicira was bought by VMware in July 2012, which may be

an indication of value of the NVP approach.

OpenStack Quantum is a brand new OpenStack component that was introduced on

September 2012. Quantum aims to create a clean interface that opens the OpenStack

37 http://www.surfnet.nl/ 38 http://www.openflow.org/ 39 http://nicira.com/en/network-virtualization-platform

Domain Analysis 26

platform to aspiring network services. Quantum’s major contributor is also Nicira,

who aims to use OpenStack as the vehicle for its flagship NVP product.

Embrane is another network virtualization start-up company that announced its

heleos40

platform in December 2011. heleos provides L4-7 network services (such as

load balancing, firewall) to cloud resources. The services themselves are

implemented in Virtual Machines and are exposed using the Distributed Virtual

Appliance logical container, which may aggregate multiple Virtual Machines. When

the services become too CPU-intensive, the VMs implemented the service can

increase on-demand, thus making the service scalable. This is a problem in the

traditional implementation of such services, which uses single Virtual Machines that

effectively cannot scale.

OpenNaaS41

is a newly announced (2012) research project that aims at providing an

open-source NaaS platform. The platform is conceived as a service to different

network domains, allowing them to provide their network infrastructure to the cloud.

It gives them the ability to abstract arbitrary network infrastructure resources (routers,

switches, more complex systems) and provide them as virtualized resources to end-

users. The scope of the OpenNaaS project is quite large; it aims to serve from simple

cloud users to telecom operators. OpenNaaS is at a very early stage of

implementation.

Edge Virtual Bridging (IEEE 802.1Qbgm, EVB) [14] and Multiple VLAN

Registration Protocol (IEEE 802.1ak, MVRP) [15] are IEEE standards that may help

overcome the “VLAN-clean” requirement imposed on cloud site’s switching

infrastructure. These protocols deal with automatic configuration of switch trunks so

that they allow only the VLANs that are needed for the proximate VMs, and not the

whole range of possible VLANs. This is achieved by providing switches with the

ability to be informed with which VLANs are required; this information is provided

by the hypervisors. The protocols are work-in-progress, and need software support

from switches and hypervisors. They are currently not available in implementations.

All these developments indicate that network virtualization and NaaS is an area that

is being developed, both at research and commercial level. It is important to note that

all the presented solutions are either closed-source, in an early conceptual or

implementation stage, and/or require support from switches’ software or hardware.

40 heleos white paper available at http://www.embrane.com/resources/documents/embrane-

architecture-white-paper 41 http://www.opennaas.org/

System Requirements 27

3 System Requirements This chapter describes the purpose and the scope of the system, as well as its key

functionality in the form of use cases. From these, a list of system requirements is

extracted.

3.1 The Network Resource Management System

To address the issues raised in section 2.5, a service is needed that is able to provide

network services by managing all the network “resources”, which are the network

switches, the various means for inter-cloud connections, and the VM gateways. For

this purpose, the Network Resource Management System (NRS) has been envisioned.

NRS is designed to be a service to a cloud platform, responsible for accommodating

networking among virtual machines, either in the same cluster or between different

clusters. However, NRS is not restricted to serving a cloud platform. Any user or

service that needs to provide network connections should find the NRS service

useful.

To accomplish its purpose, NRS has three main functionalities. It can:

1. accept a connection request and correctly map it onto the existing network

infrastructure.

The network request comes from a cloud platform (but is not limited to that).

The request involves one or more (virtual) network interfaces that should be

put in a specific VLAN. NRS should be able to find how the given interfaces

are connected to each other over the physical network.

2. configure the network hardware involved to accommodate the desired

connection.

Once the request is mapped to the physical network, the switches that

connect the network interfaces to each other should be configured to allow

traffic from the VLAN of the request

3. negotiate and manage inter-cloud connections by talking to an NRS located

in a different cloud site.

The envisioned usage of NRS is illustrated by the following example: A user has

requested four VMs and one local network that the VMs should be connected to. The

VMs reside in three different physical machines (VM hosts), which are connected via

switches as shown on the right side of Figure 16. The left side of the figure represents

the logical isolated network that the user should experience (connected with dashed

lines).


NRS will:

1. Receive a request from the cloud platform, asking to connect the four VIFs

(one for each VM) to an isolated network segment. VMs shown with blue in

Figure 16.

2. Find out how the underlying physical network connects the interfaces with

each other. In this example, the connection among all VIFs includes all the

switches shown in Figure 16.

3. Configure the devices along the connection, so that an isolated network is

created (When using 802.1Q VLANs, find select an unused VLAN id and

allow it through the network ports in the connection path).

3.2 Use Cases

The basic NRS functionality within a cloud environment can be sufficiently

demonstrated by a few use-case scenarios. They are described in the following

section. Each scenario has an entity sending a request to the NRS. As the NRS is

mainly a service for a cloud platform, that entity would be the cloud platform. That

means that a user of the cloud platform has requested resources from the cloud

beforehand. These resources include virtual machines and network between them.

The cloud platform has accepted the user request and has created its own request to

the NRS to realize the connection between the requested VMs.

3.2.1 Target users

The NRS service is intended primarily as a service to a cloud platform, but it is not

necessarily limited to that. As a network service, it should be usable by any program

or individual that needs to provision network. Besides the user of the provisioning

service, NRS has an additional user, the network administrator. The administrator

should be able to control the behavior of NRS and to request information about

current state of the network(s).

Figure 16: Physical and logical connectivity of VMs


Table 6: NRS target users

Name Description

Cloud platform Requests network connections between network interfaces of

Virtual Machines that the cloud platform is serving.

Arbitrary NRS service

user

Requests network connections between any network

interfaces that the user needs to connect.

Network administrator Configures and monitors the NRS service.

3.2.2 Basic request scenario

The simplest usage scenario is that in which simple network connectivity is requested

between virtual machines located in the local cloud site. In that case NRS has to deal

exclusively with configuring the internal network of the cloud site. A sequence

diagram of the request is shown in Figure 2.

For this scenario I have also included the cloud user in order to portray the bigger

picture of the usage scenario. The sequence of steps goes roughly as follows:

1. The cloud user requests a set of VMs, which he would like to be in the same

local area network.

2. The cloud platform picks VM hosts to instantiate the VMs requested and

then provides the NRS with the hosts and the network interfaces that should

be connected in the same private network.

3. NRS finds how the connection can be mapped onto the internal network

infrastructure, i.e. finds a path that connects all VIFs to each other.

4. NRS configures switches along the path (found in 3) to allow a new private

network (allow a new VLAN in the switch ports)

5. NRS informs the cloud platform whether its request was accommodated

Figure 17: NRS basic request usage scenario


successfully.

6. The cloud platform can continue its normal operation (launch the VM,

inform the user, etc.).

Once this sequence of actions is done, the user’s VMs should be connected to a

private and isolated network.

This concludes the basic request, and the concept behind it is the basic functional

block of NRS. It is about creating isolated networks and connecting network

interfaces to them. A request can be enriched with various additions, as described in

the following sections.

3.2.3 Internet Request scenario

It is desired that a network request can include Internet connectivity, which means

that the user would want the virtual machines on his network to have access to the

Internet. There are various possible ways to expose a private network to the Internet.

A simple one would be for the network administrator to provide an Internet gateway,

a network node (perhaps a router) that provides Internet connectivity. If then a user

needs Internet connectivity, all that needs to be done is to connect his private network

to the Internet gateway and configure the gateway appropriately. For NRS to

accommodate this request, it needs to be able to connect an existing private network

to a specific node in the network (the gateway). This is similar to the basic request

scenario in section 3.2.2 above.

3.2.4 Quality of Service Request scenario

Another type of request would be one that includes certain network Quality of

Service (QoS) features. The QoS feature that is mostly of interest is bandwidth

guarantees.

To satisfy any QoS request between two VMs, the network path chosen to connect

the VMs must consist of switches that support this specific QoS functionality. This

means that for NRS to acocomodate QoS requests, it needs to know what kind of QoS

is supported by the switches of the internal network, to find a network path that

consists of switches that can support the QoS request, and to be able to configure the

QoS features of the switches appropriately.

3.2.5 Inter-cloud Request scenario

A more complicated scenario is when the network requested needs to contain VMs

that span more than one cloud site, which means that the virtual machines are hosted

in different sites but have to be in the same private network for the end-user. This

scenario can occur in two different cases:

when the original site cannot service the request (possibly due to lack of

resources), and it is desired that it will “fetch” resources from other sites, or

when the user can actually specify on which cloud site he wants each of his

requested resources to run.

Those scenarios require certain negotiation to take place between the cloud platforms

that run on each cloud site. Such negotiation will possibly include resource-related

information and has to include which cloud site will be the initiator of the inter-cloud

connection. It can also be that there is a different mechanism that decides there is a

need for an inter-cloud connection. The NRS’s part in it is to accept the inter-cloud

request from one cloud site that chooses to initiate the inter-cloud connection and

make the connection happen. This is the assumption of the inter-cloud scenario. The

negotiation mentioned here is not part of the NRS project.

After the request is received from a cloud platform, NRS needs to be able to create

and manage a connection that bridges two different local networks (one for each site)

and to negotiate with the opposite side’s NRS on the connection specifics. There are


different types of connections that can be used for inter-cloud (VPNs, lightpaths), and

NRS should not be restricted to one technology. For more see section 2.6 above.

The sequence of actions that would accomplish the inter-cloud scenario is shown in

Figure 3.

In this scenario, the cloud platform will request the creation of an inter-cloud

connection to a specific site (site #2). The connection is directly related with a local

private network: This network is to be bridged to a network in site #2 over the inter-

cloud connection. The sequence of actions to make this happen would be:

1. Cloud platform #1 requests a connection to site #2 and specifies the type of

connection.

2. NRS #1 connects to NRS #2 and negotiate the connection: This includes

determining whether both sites support the chosen connection type and

exchanging configuration details.

3. Each NRS configures its end-point for the inter-cloud connection. After this

the connection is live.

4. A response is sent to each cloud platform (or to the requester).

Once the sequence is successfully complete, the two networks are bridged. That

means that they forward traffic to each other as one Ethernet domain. The networks

may still be managed separately by their respective cloud site’s NRS and cloud

platform.

This scenario introduces some security concerns: The communication between the

two NRS' and the traffic sent over the connection needs to be secure.

Figure 18: NRS inter-cloud scenario


3.3 NRS Logic and Interfaces

To accomplish the above scenarios, we can see that the NRS system has certain

requirements, which can be categorized in three main functionalities. NRS needs to

be able to

1. Find (or decide) how a network request can be fulfilled. To do that, NRS

needs a software representation of the internal network topology and the

ability to map a network request on it by applying logic rules or algorithms.

2. Know how to configure internal network hardware devices to create isolated

networks.

3. Negotiate an inter-cloud connection with a different cloud site's NRS and

manage the connection.

To perform the above, NRS needs to interface with various distinct components. It

needs to

1. Provide an interface to the requester of the NRS service. In the case of a

cloud platform being the requester, a modification or extension of the cloud

platform will be needed that allows it to use the NRS service.

2. Provide an interface to the network administrator of the cloud site, through

which the network topology can be inserted or modified, and status

information given. (This interface can also be used for a network discovery

mechanism)

3. Have administrative access to the internal network, in order to configure

devices.

4. Start or listen to connections from different sites' NRS's and control the

means through which the connection is achieved (for example in the case of

VPN, manage a VPN box)

The components with which NRS communicates in a cloud context are shown in

Figure 4. Each part is discussed in the following sections.

The logic and interfaces mentioned in this section comprise the key functionality of

the NRS system. Each one is discussed in the following sections, in the order they are

encountered when a request comes into the system.

Figure 19: NRS in a cloud context


3.3.1 Requests and resources in cloud platforms

As mentioned in section 2.4, each cloud platform provides several interfaces and

APIs through which they provide cloud services, and there is hardly any

standardization between them.

However all APIs are requests to provide the same type of resources: VMs and

networks. All cloud platforms use the same concepts. Since the resources are

modeled in a similar fashion, the lack of this API standardization is of no concern for

the purposes of NRS.

3.3.2 NRS interface for requests

After a cloud platform receives a user request, the information relevant to the network

has to be sent to the NRS. The cloud platform will choose which physical machines

are going to host the virtual machines and knows which of their network interfaces

are to be connected to the requested network. The cloud platform needs to pass this

information to the NRS. To receive the request, NRS has to expose an interface for it.

Apart from a cloud stack, someone else may want to use the NRS service, which

could be another application or a human directly requesting a network connection.

These are to be taken into account when designing the interface and deciding on its

form.

3.3.3 Representation of the network

Software model of the network topology

Once NRS receives a network request, certain processing is required for NRS to

determine how to accommodate the request. Essentially, the request needs to be

mapped on a representation of the existing network (the network available to the

cloud service). In order to do that, NRS needs to maintain and operate on a software

model of the topology of the network. This topology needs to contain static

information about switches and hosts, their inter-connections and special features of

the devices (e.g., switch capabilities). In addition, it needs to contain dynamic

information to represent which private networks have been formed at any time, which

VLANs are in use, and other relevant information. See section 4.3 below for more on

the topology model.

Topology acquisition

NRS needs to provide an interface through which it can receive the network topology

and accept modifications to it. The simplest way to receive the topology is the

network administrator inserting and modifying it. Automatic network discovery can

also be an option, but one that is 100% complete is not easily achieved, especially

when the network contains devices from different vendors, which often pose

incompatibilities when it comes to auto-discovery. Investigating auto-discovery

further was considered out of the scope of this project.

The topology inserted in NRS should be possible to modify. For example, one may

want to remove some nodes from the network, or to add new ones. The nodes may or

may not be used by existing private networks. Such modifications should not disturb

ongoing operation or affect existing connections whenever possible.

Apart from the topology, the administrator also needs to influence NRS decisions

when it comes to mapping a request and choosing network isolation parameters. For

example the admin may want to restrict how VLANs are chosen or to influence an

algorithm that maps the request (if an algorithm is used that allows it). The admin

interface should provide these options as well.

3.3.4 Network Isolation

One of the requirements is that the private networks created for each user must be


isolated from the rest of the network. As discussed in section 2.3.3, this is trivially

achieved with 802.1Q VLANs. This protocol is enforced in switches and guarantees

isolation in the link layer of the TCP/IP protocol stack. Despite its hard limit on

scalability, it is the only universally working method that achieves the goal of

network isolation. As such, NRS will work with 802.1Q VLANs as the main method

of isolating networks. However, its architecture should allow substituting the 802.1Q

VLAN isolation method with a different one (Q-in-Q for example).

3.3.5 Switch configuration

Another requirement is to make the service agnostic of the underlying hardware,

which means being able to configure network switches from a wide variety of

vendors. All switches come with a command-line interface (CLI), through which they

can be configured. It is not straightforward to achieve switch configuration in a

model-agnostic manner, since CLIs are largely vendor-specific, and are often model-

specific among different models from the same vendor. To cope with this variation,

the concept of a switch plugin can be used. Those plugins are associated with a set of

switches that have the same interface (for example same device model or same

vendor family), and have the knowledge of how to perform configuration on their

respective model. NRS can call the plugins to perform configuration. There are

several configuration actions that are required to satisfy network requests. Those

actions can be configuring VLANs (e.g., allow tagged VLAN 3 on Ethernet port 5),

providing QoS, etc. The set of actions will comprise an interface that needs to be

supported by each plugin. Creating a plugin for a specific switch should be as simple

as mapping the specific switch commands to this set of actions.

Switches do not need to be only physical, since the configuration may need to be

applied on virtual switches as well.

CLI Alternatives

Since CLI commands are vendor-specific, there have been attempts at providing

universal device configuration mechanisms. A major such attempt is the Simple

Network Management Protocol (SNMP) [1], which was developed over a decade

ago. This is a protocol aimed to allow for network management in a uniform manner

across all network devices. The protocol is widely used for network monitoring, but it

has not proven to be successful for configuration. The reasons for that are

documented in RFC3535 [16]. There have been attempts at replacing SNMP with

different protocols, but as of today no such mechanism has been widely adopted.

Since using a method different from CLI leads to the same problems (no universal

vendor support, vendor-specific intricacies), configuration through CLI is considered

as the straightforward way to configure switches.

3.3.6 Communication with a different cloud site

In order to initiate cross-site networks, negotiation between cloud managers is

required. This negotiation is about requesting resources, so it needs a common way to

model and request resources between (different) cloud managers. Currently, this

negotiation is not supported by all cloud managers. OpenNebula has the oZones

feature (see section 2.4.1) that can aggregate multiple OpenNebula clouds, which

could make use of the inter-cloud connection feature. In any case, cloud manager

negotiation is out of scope of the NRS project. We assume that the cloud managers

reach a decision on how many resources are going to be hosted on each site, and they

want to get a connection started. This is where NRS comes in.

NRS may receive a request to connect a local network to a network of a different

cloud site. A simple way to do that would be for the other site to have NRS running

as well, to which the NRS of site #1 needs to connect and negotiate connection

information. After the negotiation, both NRS services need to jointly manage the

connection. Using this method, there is no need to know any internal network

information of the opposite site.


It is required that the network connection can be hosted on various technologies (see

Inter-cloud Request3.2.5). The NRS is the one in charge of the connection

capabilities: it must have the knowledge of what kinds of connections are possible

and have the ability to manage them.

There is no guarantee that an NRS will be installed on the opposite cloud site. It may

be that the site #2 provisions its resources in its own chosen way, or that it does not

want use the NRS service's way of handling the internal network. In that case, what is

needed is just the logic to negotiate and manage the inter-cloud connection, an end-

point where the NRS of site #1 can connect. For that reason NRS needs to be

available in an “inter-cloud only” mode that has only the cross-site connection

functionality, so that it can be installed on site #2 and perform that part only.

Security Considerations

There are two security concerns with the above suggestion: The first one is how to

verify the identity of the remote NRS. The NRS should make sure it connects to an

actual opposite NRS service and not a malicious entity claiming that it is one. The

second one is to make sure that the information exchange between NRS's is secure,

since it travels over insecure networks.

3.4 Requirements List

The previous sections essentially described the components and the key functionality

of the NRS service. These showed variabilities of the NRS system that have to be

taken into account. These variabilities must be realized by creating modules of

responsibilities and by offered interfaces. They are directly related to non-functional

requirements and have to dictate the software architecture of the system. All

requirements are summarized in Table 7 and associated with an identifier that is used

to refer to them in the rest of the document.

Table 7: Requirements List

Basic functionality:

FR1 NRS should, for a given set of network interfaces, provision network

connectivity among them.

Connectivity within one cluster:

FR2 The network provisioned that way should be isolated from the rest of the

networks existing on the network infrastructure.

FR3 NRS should take into account VLAN id restrictions of network devices when

connecting network interfaces.

FR4

The network administrator should be able to insert the network topology. He

should be able to modify it while NRS is running without requiring a restart

or disturbing existing connectivity.

FR5 The network administrator should be able to set restrictions on which

isolation ids (e.g. VLAN ids) are available to the service.

NFR1

The network interfaces can be in the same computer cluster. In order to

connect them in the same private network, knowledge of the network

topology of the cluster is required. Any arbitrary topology should be

supported.

NFR2 To provision network, NRS needs to be able to configure network switches.

Any switch model should be supported

NFR3 The network isolation method should not be limited to one technology (e.g.,

VLAN only), but should be easily exchanged with any other.


NFR4

NRS should provide an interface that can be easily used with any existing

cloud manager (with a plugin), or even a different application that needs the

NRS service.

Connectivity outside the cluster:

FR6 NRS should be able to provide Internet connectivity to a network.

FR7 NRS should provide connectivity to networks that lie in a different cloud site

by negotiating with the NRS service installed there (inter-cloud connection).

FR8 NRS should be usable in a mode that only services inter-cloud connections.

NFR5 NRS should support inter-cloud connections over various means (e.g., VPN,

lightpath).

Advanced functionality:

FR9

Apart from basic network connectivity, NRS should be able to provide more

features over the requested network:

bandwidth

QoS

ACLs

Performance:

NFR6

The amount of time required from the moment of receiving a network request

until its allocation is complete should be deemed “small enough” by the user.

Requesting a virtual machine from the cloud can take from a few seconds to a

few minutes, depending on various configuration choices. The network

configuration overhead of the NRS service should not increase that time by a

substantial amount. As such, an upper limit of 10 seconds to service a request

is considered sufficient.

System architecture 37

4 System architecture A prototype of the NRS service was built for the purposes of this project. This

chapter describes the architecture of the prototype, which consists of modules with

different roles and responsibilities. Each module description is accompanied by class

and sequence diagrams of UML42

. The architecture is the result of several design

choices, the motivation of which is documented as well.

4.1 Prototype Overview

The prototype implements the basic functionality, which provides network

connectivity for network interfaces in the same cluster. The networks provided in this

manner are isolated, i.e. separated from each other. The prototype also supports inter-

cloud connections. The prototype does not have support for requests for bandwidth or

for being able to map requests on the topology in “clever” ways, i.e., it employs

simple path-finding in order to map requests (see section 4.6). The extent to which

the requirements from section 3.4 are satisfied is discussed in Chapter 7.

The prototype was implemented in Python (implementation details in Chapter 5). Its

design is independent from the implementation language as much as possible43

.

The goal for the prototype was to show the feasibility of the basic functionality in a

manner that encompasses all the concepts that were envisioned in the NRS system,

among which are:

1) providing an interface suitable for cloud platforms

2) having knowledge of the network in the form of a topology and being able to

operate on it to infer whether and how connectivity is feasible

3) realizing a connection by configuration of devices

4) negotiating inter-cloud connections

While creating the above, the goal was to explore the design options and create an

architectural basis which can be further developed to add more features. The design

decisions taken for the prototype were based on the requirements.

The architecture is presented in Figure 20 and is arranged in layers which group

together different levels of abstraction. At the top lie the modules that provide the

interface and high level functionality to the service’s users, in the middle lie the

modules that provide the logical model of the network, and at the bottom lie the

modules for hardware configuration. In the diagram of Figure 20, we identify eight

modules with the following responsibilities:

NRS, which implements the interface that exposes the operations available

to NRS service users

inter_cloud, which deals with the inter-cloud negotiation with a remote

service

admin_CLI, which provides the administrator with access to the system at

run-time

request_manager, which deals with the high-level logic of reserving and

42 Unified Modeling Language [17], a modeling language that describes object-oriented

software 43 A few Python features however, have influenced the design: The dictionary data structure,

which is an associative array, is extensively used, and function objects and closures are used as

well. In addition, the fact that Python is not statically typed reduces or eliminates the need to

create (pure) abstract classes that are prevalent in designs of statically typed languages such as

Java and C++. Such classes have been used however to improve the clarity of the design.


allocating requests

network_isolation, which manages network isolation (such as VLAN

assignments)

topology, which contains the software model of the network topology

algorithms that operate on the topology to infer how network interfaces can

be connected

device_plugins, which deals with device plugins that perform device

configuration

4.2 Service Interface

The NRS interface was decided based on how cloud stacks deal with networking in

conjunction with what NRS wants to achieve. The interface provides operations on

two basic resources: virtual networks and network interfaces.

The virtual network resource is offered by all cloud stacks, and has a unique identifier

known to the cloud stack. The virtual network represents the isolated layer 2 network

segment that can be requested by users of the cloud. Specifically, the cloud users can

request attaching their virtual machines to such a network, which will result in a new

network interface created for the virtual machine, through which the virtual machine

will connect to the network (this is performed by the cloud platform). All the service

calls offered by NRS always contain an identifier that uniquely identifies the network

segment which the user requests connectivity to, as well as the name of the network

interface(s) that are involved in the operation.

The exposed interface operations are shown in Figure 21. The operations can be

conceptually grouped in four distinct functionalities. These are presented in the

following sections as sets of function declarations (omitting types).

Figure 20: NRS layered architecture


The interface is implemented by the NRS object, whose methods are invoked by the

TCPServer. The server listens to incoming connections for requests, and translates

them into the NRS operations. The implementation of these constitutes the ‘NRS

service’. Implementation details can be found in Chapter 5.

4.2.1 Connecting network interfaces

connect(network_id, network_interfaces, isolation_id=None)

disconnect(network_id, network_interfaces)

These operations will connect/disconnect the given network interfaces to the virtual

network identified by the network id. The network interfaces are expected as (‘host

name’, ‘interface name’) tuples that uniquely identify an interface in the network

topology (see 4.3.2). The outcome of these operations is that the interfaces will

receive all layer 2 traffic from the L2 broadcast domain that is represented by this

network segment. In other words, all network interfaces that are part of the specific

network have L2 connectivity to each other.

The precise way of how the interfaces are “put” in the network depends on the plugin

that performs the configuration for the device. In general, this operation is meant for

interfaces that can receive traffic from multiple virtual networks (trunk ports). For

example, if the network isolation method is 802.1Q VLANs and the interfaces are

switch ports, then connecting the interfaces to the network means that the interfaces

become “trunk ports” for the network’s VLAN id. If the interfaces belong to a host

using Linux bridges, the outcome is a creation of a VLAN device for the specific

VLAN id. For more on plugins see sections 4.7 and 6.2.

Isolation id is an optional argument that is related to how virtual networks are

deployed. If an isolation id is specified, NRS will attempt to associate it with the

virtual network. This argument is meant to be used the first time a network interface

is connected to a network, so that the virtual network is deployed with the given

isolation id. For instance, if the isolation method is 802.1Q VLANs, the isolation id

corresponds to the VLAN id that NRS will attempt to use for the specific network.

This may fail for several reasons (VLAN id already in use, network already

associated with a different VLAN id).

For the calls to be successful, the network interfaces passed to these calls must be part

of the network topology, which means that they must have already been inserted to it

in a previous moment in time (more about the topology in section 4.3).

Figure 21: NRS Service interface operations


The connect call is internally implemented in two stages that happen in sequence:

First the connection request is reserved in the internal topology, which means that the

resources that make the connection possible have been “reserved” for a subsequent

allocation. Then the allocation will simply perform the configuration of the network

devices (see 4.8 for more on how each step works). In case there is a need for the

service user to reserve resources without performing the actual allocation at the same

time (but sometime later), two additional operations are provided that perform these

two steps:

reservation_id = reserve(network_id, network_interfaces,

isolation_id=None)

The reserve call returns an identifier for the reservation, which can be used later to

allocate it, using the operation

allocate(reservation_id)

To ensure network consistency, each reservation created this way has a dependency

on the previous reservation done on the same network. Each allocate must be called

on reservations sequentially in the order they were created by the reserve call.

Otherwise the allocation calls will not be accepted.

4.2.2 Creating virtual network interfaces

The operations in the previous section can be used to connect network interfaces that

are already part of the network topology, but this is not the case when cloud platforms

are launching or shutting down virtual machines. For each new VM launched, new

virtual network interfaces are created that need to be connected to a network. In order

for them to be connected, they need to be put in the network topology first. Similarly

when a VM is shut down, its network interface(s) needs to be removed from the

topology. Allowing any service user to modify the topology in order to insert new

interfaces is not desired, since the topology modification is an action restricted to the

administrator. To alleviate that, two new calls are provided that are to be used by

users needing to launch new virtual machines and connect them to a network.

vif_connect(network_id, host_interface, virtual_interface,

isolation_id=None)

vif_disconnect(network_id, host_interface, virtual_interface)

reservation_id = vif_reserve(network_id, host_interface,

virtual_interface, isolation_id=None)

The vif_connect call will first create a new virtual machine in the topology with the

virtual interface provided, and then connect it to the host interface. Then it will

connect it to the network with the provided id, as in 4.2.1 above. The

vif_disconnect call will first disconnect the virtual interface from the network and

then delete it from the topology. The vif_reserve is also provided and will return a

reservation id that can later be used to allocate it, identical to the reserve functionality

in 4.2.1.

These calls are meant to be used by cloud platforms or other users or software that

create and delete virtual machines in hosts exiting in the topology.

4.2.3 Inter-cloud connections

This family of operations is related to setting-up and tearing-down inter-cloud

connections.


start_inter_cloud(local_network_id, remote_network_id,

remote_service_address,

remote_service_port,

connection_type)

stop_inter_cloud(local_network_id, remote_network_id,

remote_service_address,

remote_service_port,

connection_type)

The inter-cloud connection bridges the local network with the remote network over a

connection of the provided type (e.g., VPN). To do that, the local NRS service

negotiates with the remote service that is listening for connections on the remote

address provided. This directly implies that there needs to be a service running on the

remote location that will accept and process the negotiation. In addition, both services

must support the connection method requested. See more on inter-cloud connections

in sections 4.9 and 6.3.

4.2.4 Connecting to a gateway

The last family of operations gives the possibility to connect virtual networks to

gateways, which are network nodes that provide a special service or functionality and

are a static part of the topology. These nodes are associated to unique names that

indicate their functionality. One such node could be an Internet gateway that would

provide a virtual network with Internet connectivity. If the gateway is identified by

the name “Internet”, in order for a virtual network to gain/lose Internet access, the

following operations must be issued.

connect_gateway(network_id, “Internet”)

disconnect_gateway(network_id, “Internet”)

4.3 Topology

The Topology is a software model of the cloud site’s network topology, i.e. the

network of the cluster hosting the cloud service. It contains all the information related

to the network devices and how they are connected to each other. This information

must be sufficient to be able to make decisions on how to allocate network

connections. The NRS administrator can insert or modify this information to reflect a

change in the cluster network.

The model of the topology implemented for the prototype has the basic concepts that

are needed to describe both the static and the dynamic nature of the topology. The

static part is the set of devices that are part of a cloud cluster and the static (physical)

network connections between them. The dynamic part consists of the (logical)

network segments that are formed on top of this static structure.

The modeling of the topology is done in two levels: The first one is the creation of

abstractions that represent the network components, and the second is the modeling

of the components as graphs, in order to facilitate finding connections between them.

This section describes the first level.

4.3.1 Network Markup Language

To describe a topology, the first thing needed is the creation of abstractions of the

basic components of the network devices that are of interest for networking. These

abstractions come in the form of classes and objects. The modeling of the topology

was heavily based on Network Mark-up Language (NML) [17] [18]. NML is an

“effort to define a schema for description of hybrid network topologies.” There have

been various efforts to create data models that describe networks and network paths,


and NML is trying to converge the data models and pick the best pieces from each

one. In NML, networks are described at the level of intra-network resources, which

means that NML is suitable to describe the network of a cloud cluster at a level that is

of interest to NRS. In addition, NML is an OGF44

standard and SARA, which is a

research institute proximate to Nikhef, employs NML collaborators which were

available to talk to. These are the reasons NML was chosen to use for the NRS

topology representation. NML’s data model, with elements and relations between

them, is shown in Figure 22.

Despite its suitability, NML is at an early development stage. Its model and

implementations were changing during the duration of the project constantly and

therefore there were no implementations of the model that could be used in a software

product; NML provides XML and RDF schemas, but no software implementations.

In addition, NML tries to capture any arbitrary network and therefore contains a lot of

details that were of no interest for the network description and functionality required

by NRS. Choosing to follow all NML conventions would lead to a more bloated and

harder to manage NRS model and implementation and would take away its

simplicity, which was essential to be able to produce the NRS prototype in time. As

such, the NML concepts deemed useful were borrowed to create the NRS network

model, and the rest of them were ignored. When reasonable, there is a direct

correspondence between NML objects and objects in the NRS network model. In

other cases, NRS has constructs that are not an explicit part of the NML data model

but can be described by a combination of its objects. In some cases, NML seems to be

missing descriptions for certain concepts (VLANs); these are certain to be added in

the future development of NML. Ultimately, the NRS network model can (or should

be possible to) be described in NML, but it cannot be used to describe any arbitrary

NML network.

4.3.2 Network components

The class model of the topology that contains all network components is shown in

Figure 23. The building block of the network topology in NRS is the Network Node,

which represents a device that can be connected to a network. In the cloud

44 OpenGridForum (http://www.gridforum.org/), an organization that develops Grid standards

which are adopted by the scientific Grid community 45 Image source: Network Markup Language [18]

Figure 22: NML object model 45

http://www.gridforum.org/


infrastructure setting, these can be virtual machine Hosts and Switches. A Network

Node is a static member of the network topology and has a unique IP address through

which it can be reached by NRS. NRS needs to be able to reach all the nodes in order

to perform configuration, as will be seen in later sections. Each Network Node can

have Ports, an object which represents a network interface of a host, or a port of a

switch. Ports can have a connection to a single other Port, which is represented by the

Link object. The Link does not refer to a cross-connect in the same node, but to what

is commonly referred to as a network link, which is a physical medium (e.g., Ethernet

cable) connecting different devices. Ports that belong to the same Network Node have

unique names among each other (e.g., ‘eth0’). Therefore, a port can be uniquely

identified by its name combined with the name of its node owner.

A Host object can be specialized to be a Virtual Machine. This specialization is

meant to be used when a Virtual Machine needs to be a static part of the topology and

exhibit behavior similar to the rest of the Network Nodes, i.e. have a unique IP

address through which it can be reached and configured by NRS. The usefulness of

such static Virtual Machines can be seen when they are needed to provide a special

functionality such as being gateways to external networks (see a VPN gateway in

section 6.3). The need for this specialization did not come as a need to represent

virtual machines launched by cloud platforms. These virtual machines do not need to

be reachable by an IP address from NRS in the first place, since NRS only attempts to

provide layer 2 connectivity for them.

When virtual machines reside in hosts, the host sees the virtual machine’s network

interfaces with specific unique names (depending on the naming convention, often

seen as ‘vnet0’, ‘vnet1’). The same interfaces are seen with different names from

within the virtual machine (most likely as ‘eth0’, ‘eth1’, etc.). A correspondence of

these names is kept in the Host object with the vifs hash table. This is needed to be

able to connect the virtual machine to any network, as seen in section 4.8. When a

virtual machine is added to a host, its interfaces are connected to a selected host

Figure 23: Basic network components


interface. That interface will act as the VM gateway.

All the objects mentioned in the previous paragraphs are contained in a Topology

object, which represents the sum of network devices and links that comprise the cloud

cluster. These objects comprise all the static information of the network topology and

are a sufficient basis to describe cloud cluster networks.

Besides the static information there is also the Layer 2 Network object (L2Network),

a logically isolated private network that corresponds to the Virtual Network available

to cloud users. These objects are created and modified dynamically to reflect changes

coming in from user requests. An L2Network has a list of ports, which represents all

the interfaces that are connected to the network. Essentially, the L2 Network is a

‘star’ topology that connects all the ports that belong to it to each other, but without

consideration of its implementation. Ports can belong to multiple L2Networks at the

same time (trunk ports). Each network is uniquely identified by its uid number. The

information on how these L2Networks are logically isolated is noticeably absent from

the object itself, for example there is no 802.1Q VLAN id or something equivalent.

This allows the implementation of the logical isolation of the networks to be replaced

with different ones, without changing the L2Network object or its port membership.

The network isolation information is managed by the NetworkIsolationManager

(section 4.5). For information on how a network is created, assigned a VLAN id, and

populated with ports, see section 4.8.

The objects of the topology have multiple attributes that need to be defined on their

creation, apart from their ports (the additional attributes are explained in the later

sections of this chapter). The need to specify multiple attributes at the moment of the

nodes’ creation makes it beneficial to have them created by Builder objects [19, pp.

97-106], shown in Figure 24. The NodeBuilder has various methods that can

incrementally create NetworkNode objects. In addition, the NodeBuilder’s build()

method is a Factory method [19, pp. 107-116] that defers the actual creation to its

subclasses, SwitchBuilder and HostBuilder, which are the ones that create the

concrete NetworkNode in the form of a Switch or a Host. The builder objects can be

useful to create duplicates of a complicated node, for example a switch with multiple

ports and other settings, greatly reducing code duplication. The NodeBuilder also

uses the “fluent interface” API style [20], which allows for method chaining until the

object is ready to be constructed. For example, one could do

cisco_cata_2960_builder = SwitchBuilder().\

add_ports(‘0/0/’, 1, 24).\

add_ports(‘1/0’, 1, 4).\

add_plugin(plugins.cisco_cata).\

set_alloc_prio(5)

cisco_cata_2960_1 = cisco_cata_2960_builder.\

set_name(‘cata_2960_1’).set_ip(’10.80.80.32’).build()

cisco_cata_2960_2 = cisco_cata_2960_builder.\

set_name(‘cata_2960_2’).set_ip(’10.80.80.33’).build()

This would first specify a switch builder with two sets of ports (0/0/1-24 and 1/0/1-4)

and some additional information (explained in later sections). This builder object can

be used to instantiate multiple switches later. The actual switch object is not created

until the build() is called, which allows for flexibility in creating the objects.


4.4 Graph representation of the network topology

As described in section 4.3, the network topology contains a lot of information; this

information needs to be somehow operated upon to be able to find paths between

ports. Since the devices that are connected to each other form a graph, it is natural to

use graphs from graph theory to describe all these objects and their inter-connections,

and to use graph algorithms to perform any required operation on them. There are

various graph algorithms (shortest path, maximum flow) [21] that are perfectly suited

for the tasks of finding the right connection paths among the network nodes. This

brings us to the graph modeling of the network: Each network node (VM hosts and

switches) was modeled by a graph, in a way that it captures how network traffic flows

through the node and its ports.

4.4.1 Graphs

A graph is an abstract representation of connected objects. It consists of vertices,

which represent the interconnected objects, and edges, which are the links that

represent the relation that connects the objects. Vertices are also called nodes. To

avoid confusion with the network nodes from the topology, we will use the term

vertex to refer to the graph nodes and the term nodes to refer to the network nodes.

An edge must always have two vertices as its endpoints (i.e., an edge cannot exist

with an unconnected end). An edge can be directed, which means that the relation

that connects the objects is asymmetric, and undirected, which means that the relation

is symmetric. An edge may have a weight associated with it, which is a number that

quantifies the connection that the edge represents in the context of a chosen semantic,

e.g., the weight can have the meaning of distance, or difficulty to ‘cross’ the edge.

Weights are useful when trying to find paths between vertices. A path between two

vertices exists if there is a sequence of edges, each edge starting from a vertex that

the previous edge left off, that starts from one vertex and ends at the other.

In networks, the unit through which network connections are achieved is the network

interface, or port. The goal is to find “connections” between the ports, which are the

endpoints of the connections, so it seems natural that ports should be represented by

graph vertices and “connections” by edges. More precisely, an edge between two

vertices implies that the ports can have layer 2 connectivity to each other, possibly

after a certain configuration takes place. This is used to find paths between ports; if

there is an edge between two ports, the “physical” link between the ports is already

there, and the ports can be put in the same network by just applying certain

configuration. The edges are undirected; this represents that we are dealing only with

Figure 24: Network Node Builders


full-duplex traffic.

Switch graph

These choices are better illustrated if we look how each network node is mapped to

graphs. Figure 25 shows the graph for a switch. A switch consists of a set of ports

that have connectivity to each other through the switch “backplane” (if they belong in

the same VLAN). The backplane is represented by an extra vertex, the “backplane”

vertex. There is an edge between every vertex that represents a switch port and the

backplane vertex. That way, all switch ports can be connected to each other through

the path that passes from the backplane vertex.

Host graph

The graph for the virtual machine host in Figure 26 looks similar to the switch graph,

although it represents slightly different concepts. A host can have a number of

network interfaces, unconnected to each other. The interfaces are not connected

following the assumption that if you need more than one host interface for network

provisioning, the second interface is most likely there to connect to a separate

network from the one that the other interface connects to. (This is not the case of

interface bonding; bonded interfaces should appear as one in the graph model, see

chapter 7 for more). Typically hosts are created with only one interface, which will be

used for providing network to virtual machines.

A host can also have virtual machines, which have their own network interfaces.

These are connected to the host’s interface which plays the role of the VM gateway

for the specific VM. The edges connecting the host to the VM interfaces represent the

connections that are possible through the virtual switch installed on the host. In

Figure 26 for example, the VMs’ interfaces are connected to the eth0, which is the

VM gateway for this host.

The exact way that the VM’s interfaces are connected to the host’s interfaces depends

on the chosen virtual switch implementation for the specific host. That is why we

chose to model it as a simple direct edge and hide the actual connection method as an

Figure 25: Switch graph model

Figure 26: VM host graph model


implementation detail of the host configuration plugin. For more on host plugins see

section 6.2.

Topology graph

The switch and host graphs represent standalone network devices, unconnected to

others. To represent a device’s ports connected to each other, the device graphs are

contained in a larger graph, the graph of the Topology. This graph also includes edges

between different network devices, which represent a physical connection (commonly

an Ethernet cable) between the ports of different network devices. These edges are

created when the Topology object’s attach function is called. An illustrative

Topology graph is shown in Figure 27.

The topology in Figure 27 consists of two switches and three hosts. The switches are

identical. Each host has one interface, which is connected to a specific switch port,

and one host has a virtual machine. This topology corresponds to the object diagram

of Figure 28 and would be instantiated with the following code block:

#Create four identical hosts host_b = HostBuilder().add_port("eth0")

host1 = host_b.set_name("host 1").build()



vm = host_b.set_name("vm").build()

#Attach the last as a vm and to host3’s eth0

#host 3 should see vm’s eth0 as vnet0

host3.add_vm(vm, 'eth0', [('vnet0', 'eth0')])

#Create two switches

switch_b = SwitchBuilder().add_ports(‘0/0/’, 1, 24)

Figure 27: Topology graph


switch1 = switch_b.set_name("switch 1").build()

switch2 = switch_b.set_name("switch 2").build()

#Create and populate the topology top = Topology() top.add_host(host1).add_host(host2).add_host(host3).\ add_switch(switch1).add_switch(switch2) top.attach(host1.port("eth0"), switch1.port("1/0/1"))

top.attach(host2.port("eth0"), switch1.port("1/0/3"))

top.attach(host3.port("eth0"), switch2.port("1/0/1"))

It can be seen that, unlike a Link, the cross-connects in the same network nodes (e.g.,

switch ports being connected to each other, or host interfaces connected to virtual

machine ones) are not explicitly modeled. Rather, they are inferred from the graph

models. As the prototype was being developed, a pragmatic approach was used to

create the topology design, and during the iterations it went through, there was no

need to model cross-connects explicitly (see Chapter 9 for the project development

timeline). See Chapter 7 for a discussion on the link concept when it comes to

bandwidth provisioning.

L2 Network graph

The L2Network graph is populated dynamically as ports are added or removed from

the L2Network. This graph has no pre-defined static structure. It is a subgraph of the

topology, it can contain vertices that belong to different network nodes, and it is

always a tree, which means that it is acyclic (see more on network consistency in

section 4.6.4). In Figure 29 you can see an L2Network graph, overlaid on top of the

topology graph from Figure 27. Multiple L2Networks can coexist independently on

top of the topology.

Figure 28: Object diagram of the topology of Figure 27


4.4.2 Graph classes

Each network node graph is represented by its own class in the class model. This is

required because the graphs are constructed and populated with vertices in different

ways depending on what type of node they represent. Each time a network node is

instantiated, or a port is added to it, its corresponding graph object is properly

updated as well. The graphs were not implemented as part of the existing network

node classes to avoid strong coupling.

The library used to implement the graphs is networkX [22], a Python graph library. It

was chosen based on its popularity, its intuitive interface and the fact that it provides

a wealth of graph algorithms. The network node classes do not have a direct

dependence on the library, but to a graph object interface that implements a few basic

graph functions, such as add_node(), add_edge(). This makes it possible to replace

networkX with a different library if required, by wrapping the new library’s graph

objects around these functions, although this would mean that the algorithms should

be replaced as well (see next section). The new graph classes are shown in Figure 30,

where you can also see a few details on how graphs are populated when the network

node methods are called.

There is one more addition to the graph classes, added to overcome what could be

considered a limitation of the chosen graph library. The library allows to group a

subset of vertices and edges of a graph into a new graph (a subgraph of the original),

but the graph and the subgraph can maintain no relationship to each other.

Theybecome two independent graphs, and changing membership of vertices or edges

in one of them is not reflected to the other one. In the case of the topology, the

topology graph is a graph containing the smaller graphs of hosts and switches (which

are its subgraphs). Every time one of these network node graphs is modified (new

ports added, for example), the topology graph needs to be updated. Therefore, the

observer pattern [19, pp. 282-292] is used to notify and update the topology graph

each time one of its network nodes is modified. The pattern as applied to the graph

classes is shown in Figure 31.

Figure 29: L2Network graph (overlaid with red)


The representation of network objects with graphs allows the straightforward usage

of graph algorithms to achieve finding paths among them. Before moving to the

algorithms however, there is an additional module that needs to be explained and that

is used by the algorithms. It is the network isolation module, which manages how L2

networks are logically isolated, and it is presented in the following section. The

algorithms are presented in the following section.

Figure 30: Graphs in the class diagram

Figure 31: Observer pattern


4.5 Network Isolation

The L2 networks formed on top of the topology need to be logically separated from

each other, and the network isolation module represents the mechanism that

implements this isolation. To isolate networks, a piece of information needs to be

associated with each network, such that when it is used to configure devices, it results

in isolated networks. VLAN identifiers fit with this concept perfectly. In order to be

able to replace such an isolation mechanism with a different one (e.g., replace the

802.1Q VLAN ids with the Q-in-Q id tuples), the isolation information is not part of

the network object itself, or present anywhere else in the model, except in the

NetworkIsolationManager object. This object maps Network objects to specific

isolation information, and it is the only object with this knowledge. The

NetworkIsolationManager class is an abstract class; to support an isolation

mechanism, an implementation has to be created. The NetworkIsolationManager

exposes methods to reserve or release isolation information from networks.

Implementations of the abstract class should also provide methods for an

administrator to control its parameters or assignment mechanisms; these are

implementation specific.

The implementation of 802.1Q VLAN isolation is done in the VlanManager class,

where the isolation information being associated with each L2 network is an integer,

the VLAN id. The VlanManager deals with assigning unique VLAN ids to networks

and with managing the available VLAN id pool, where ids are reserved from or

released into. VLANs can be assigned either to network objects identified by their

network id, or assigned to a unique string that identifies that the VLAN is reserved

for a specific purpose (e.g., typically in computer clusters, a VLAN id is reserved for

a VLAN where management of computers and switches is performed. This is done

for security reasons and the network is commonly called the management network).

The string does not serve any purpose other than to remind the administrator of what

the VLAN is reserved for, and it does not occur anywhere else in the model. A few

illustrative VLAN manager operations follow:

reserve_isolation_id(2) #associate an unused VLAN with network with id 2

reserve_isolation_id(3, 5) #(try to) associate VLAN id 5 with network 3

reserve_named_isolation_id(“management”, 1) #associate VLAN 1 with “management”

The class diagram of the network isolation module is shown in Figure 32.


In Figure 32 you can see references to ‘restrictions’ and ‘restrictors’. This is the

mechanism used for imposing restrictions on available or allowed VLAN ids (or

isolation info in general) for the NRS service to use. This may be useful to an

administrator that wants to prevent NRS from using an arbitrary set of VLAN ids.

Since VLAN ids are integers, the restrictions are implemented as function objects,

that operate on one integer argument (the VLAN id) and return a Boolean, which

conveys if the specific VLAN fulfills the restriction or not. To illustrate46

, the

following statements

restrictor1 = lambda i: i <= 1024

restrictor2 = lambda i: i != 6

vlan_manager.set_isolation_restrictions([restrictor1, restrictor2])

will make sure the VLAN id is lower than 1024 and not 6.

The same VLAN id restriction mechanism is used for NetworkNodes. It is often the

case that a legacy switch model will not support the full range of 4094 VLANs that

the standard supports, but a lesser number. This can be modeled by having

NetworkNodes keep a list of such VLAN id constraints, as seen in Figure 32. Since

NetworkNodes do not have a dependency on VLANs, they keep a dictionary of

isolation constraints that maps isolation types to a list of restrictors.

The information provided by this module is used by the algorithms, the device

plugins and the request manager, as seen in sections 4.6, 4.7, and 4.8 respectively.

4.6 Algorithms for operations on the topology

The reason for maintaining a network topology is to be able to operate on it to infer

how network interfaces can be connected or disconnected from the L2 networks

46 Python lambdas are anonymous function objects.

Figure 32: Network Isolation with VLAN ids


overlaid on the topology. There are two main operations that are needed: One is

finding how a set of network ports can be connected to a network. This is requested

by a user of the service, and his request is mapped on the topology by finding the

proper path(s). The algorithm that finds the paths is called the growing algorithm,

since it “grows” the network. The second operation is removing ports from the

network segment when the user requests so. After their removal, it is desirable to

garbage collect all the network ports that are now redundant in the network segment.

This operation is performed by the shrinking algorithm.

4.6.1 Network growing algorithm

The growing algorithm operates on the topology graph to find paths among network

ports. It is used after a request arrives to connect a port (the source port) to a specific

L2 network; for example, using the calls from section 4.2.1 one such request would

be

connect(5, “vm1”, “eth0”) #connect eth0 of vm1 to network with id 5

The network may already have ports, and the goal is to find one path that connects

the source port to one of the ports already in the network. The network graph is a tree,

so the algorithm will effectively look to add a new branch to the tree that ends to the

source port. The branch added is the shortest possible. The length of paths is

determined by the edge weights, which is a value attached to all edges of the topology

graph, with a default value of 1. The NRS prototype does not utilize weights, so that

the paths found are the ones with the fewer number of edges.

The mapping operation needs to take two additional factors into account:

1) The static isolation constraints of the network nodes. The network has an isolation

identifier (a VLAN id) that may conflict with isolation constraints of nodes.

2) The max runtime isolation ids (VLANs) that a network node can support. If this

number has been reached, the network node must be temporarily unreachable for the

mapping operation (see section 4.8 for more on that).

This is why before the topology graph is processed to find paths, it is pruned of all the

nodes whose isolation constraints conflict with the chosen network segment’s

isolation id, as well as the nodes marked as unreachable. This makes sure that the

path found consists of nodes that can accommodate the isolation id.

The growing algorithm receives the topology graph, the source port, and the target

ports (the ports of the network segment) as input. It then performs the following

steps:

1. If the target ports are empty (network is not populated), add the source port

to the path and go to step 5.

2. Get the topology subgraph whose nodes do not conflict with the network’s

isolation id.

3. For each target port: find the shortest path from the source port to the target

port (uses the library implementation of Dijkstra shortest path algorithm).

4. Return the shortest among the paths found in step 1.

5. Collect the additional network isolation constraints from all the devices that

belong in the path.

6. Return the path and the constraints.

A simple application of the mapping algorithm on the topology from section 4.4.1 is

shown in Figure 33. Starting with the target network overlaid in red on the left side of

the figure, an application of the algorithm with the source port (“vm”, “eth0”) would

extend the path as shown on the right side of the figure.


4.6.2 Network shrinking algorithm

The second algorithm is the shrinking algorithm, used when a port is to be removed

from a network. The purpose of this algorithm is to determine which network ports

are no longer required in a network and can be removed as well. The algorithm will

be called after such a request:

disconnect(5, “vm”, “eth0”) #disconnect eth0 of vm from network 5

As we saw in the growing algorithm section above, to connect a port (the source port)

in a network, a path has to be found that places all the ports that lie along it into the

network. This means that, although only one source port was requested, a set of ports

was actually connected to the network. These ports have to be garbage collected

when the source port is removed from the network, as they serve no other purpose

than to connect the source port to the network.

To distinguish between these two types of network memberships, the concept of

explicit and implicit port membership is introduced. A port is an explicit member of a

network if it was directly requested to be part of the network (through a connect()

service call). The rest of the ports that belong to the network only to provide

connectivity to the explicit ports, are implicit members of the network. A user can

only ask to remove an explicit port from a network (one that he has previously asked

to be put in). Implicit ports should be removed automatically as long as they are not

necessary.

The shrinking algorithm receives as input a port and the network that the port should

be removed from, and works as follows:

1. Mark the port as implicit, create empty garbage_ports list

2. Get the neighbors of the port in the network graph that are not already in

garbage_ports.

3. If number of neighbors is less than two (so the port has either one or no

neighbors), and

the port is not an explicit port, and

the port is not already in garbage_ports:

a. add port to garbage_ports

b. if there is a neighbor, run step 2 with the neighbor as input port

4. Return garbage_ports

The network graph that the algorithm operates on is a tree, and the input port to the

release algorithm may or may not be at the end of a branch. If it is, the algorithm will

effectively prune the branch starting from the input port and ending either at an

Figure 33: Growing algorithm operation


intersection, or at a new explicit port. (This explicit port needs its own disconnect

operation to be removed). If the input port is not at the end of a branch, which means

that it is an intersection itself, it will only be marked as implicit. If the graph of the

network is a tree, the release algorithm can never split the network graph into

unconnected subgraphs.

If the release algorithm is applied to the graph in the right side of Figure 33 above,

with the port (“vm”, “eth0”) and the network in red color as input, it will produce the

exact reverse operation of the mapping algorithm and will lead to the network on the

left side of the figure.

4.6.3 Algorithms in the class model

The two algorithms are provided as functions from the algorithms module. The

algorithms’ logic relies on functions provided by the graph library and on the

topology classes. Therefore, they are tightly coupled to both the graph library and the

topology classes, but the two couplings can be split into two different modules. The

algorithms_nx module uses the networkX library but has no knowledge of the

network node classes, and the algorithms module uses the topology classes to infer

the information to pass to the algorithms_nx module as graph information, in order

to obtain results. The modules containing the algorithm functions are shown in Figure

34 (“utility” is a UML stereotype that indicates that the class only has static attributes

and operations).

This grouping of the algorithm modules allows for interesting modular properties.

Firstly, the algorithms are exposed to the system through two interfaces, each

containing a function that invokes the corresponding algorithm. The two algorithm

implementations are simple in function, but the algorithm module can easily be

replaced with more complicated one that allows for a twist in the algorithms’ result

(the growing algorithm is mostly of interest for that), for example, using heuristics to

influence how paths are found. This modular behavior is known as the Strategy

design pattern [19, pp. 303-311].

Secondly, the top algorithm module itself relies on two provided functions of the

bottom module (algorithms_nx). The bottom module is tightly coupled with the

chosen graph library and graph representation. It could however be replaced if a

different graph library or representation is chosen, as long as its two exposed

functions are implemented.

The structure of the algorithm modules allows for flexibility in modifying and

replacing them, if one wishes to do so.


4.6.4 Network consistency

The previous sections describe operations on L2 networks. These operations are used

to manipulate the network graph, as described later in section 4.8. In addition,

administrator interference (section 4.10) may make changes to the L2 networks as

well. The network represents a L2 broadcast domain to all the ports that were

requested to be part of it; these are the explicitly requested ports of the network. NRS

must guarantee that the explicit ports are connected to each other. In turn, this

requires that the graph representing the network is always connected, which in graph

theory means that for every pair of vertices of the graph, there is always at least one

path that connects them. This effectively means that all the ports (the vertices) are

indeed in the network, and the network is not split into two or more disconnected

parts.

Apart from consistency semantics regarding connectivity, the networks need to be

consistent from the point of view of the algorithms that expect them as input. In

particular, the shrinking algorithm works by pruning tree branches from the graph,

and requires the leaves to be explicit ports. If a leaf in the graph is a non-explicit port,

it may never be selected for garbage collection by the use of the shrinking algorithm,

since it is not possible for a user to directly remove a non-explicit port. Therefore, the

leaves should be explicit ports.

The algorithms that are used to modify the networks operate in such a way that they

Figure 34: Algorithm modules


always create a tree graph, as long as the network graph was already a tree graph.

Tree graphs are by definition connected. However, if the administrator makes

changes, he might end up creating a disconnected or otherwise faulty graph. This is

why the network needs a consistency check after changes are made. There are two

consistency criteria:

1) The network graph should be a tree graph. This makes sure that all ports are

connected and that there are no cycles formed.

2) The leaves of the tree graph should be explicit ports. This makes sure that

the network can be fully removed using the shrinking algorithms.

The consistency check method is shown in Figure 35.

4.7 Device Plugins

To apply changes in the logical topology to the physical one, NRS should be able to

configure devices, in particular network switches and virtual machine hosts. Each

switch vendor and model may accept different CLI commands, and integrating all of

the different CLIs into NRS would be impossible. The concept of the device plugin

alleviates this situation. Each plugin is associated with a specific device (or family of

devices with common CLI) and has the knowledge of how to perform certain

configuration actions on the device.

The required configuration actions have to do with what changes in the topology

NRS supports. Currently, the changes are adding or removing ports from L2

networks, so that the networks can be formed, grown, shrunk and removed (section

4.6). The configuration actions that implement adding and removing ports from L2

networks are entirely dependent on the network segregation mechanism that is used.

When using 802.1Q VLANs, the actions that implement these changes are adding and

removing VLAN ids from trunk ports, or adding and removing access ports to a

VLAN. Therefore the plugins must support a set of commands that can perform these

actions. From a device-agnostic point of view, an interface of supported commands

must be created that, if implemented by a device plugin for an arbitrary device, is

fully sufficient to configure VLANs on the device.

In order to identify the commands and abstract them into an interface, various

hardware and virtual switch models were explored. When it comes to hardware

switches, at least one switch model from each of the following companies had its CLI

explored: Cisco (2 switch models), Juniper (1), Dell (1), 3com (1), Brocade/Foundry

Figure 35: Network consistency


(3), Nortel(1). While the list is not exhaustive (Arista networks is missing for

example), these companies comprise the majority of switch manufacturers, and it

does not seem likely that another vendor’s CLI would have drastically different CLI

concepts. Besides hardware switches, two software switches’ interfaces were

explored: the Linux bridge and Open vSwitch. Both are open-source software, with

Linux bridge being included by default in most Linux distributions.

The commands identified for 802.1Q VLANs are:

setup (vlan_id, node)

perform setup operations for the node to host the VLAN

cleanup (vlan_id, node)

perform cleanup operations to remove the VLAN from the node

allow (vlan_id, node, port)

adds the VLAN id to the trunk port of the node

disallow (vlan_id, node, port)

removes the VLAN id from the trunk port of the node

Usage example:

#allow vlan id 5 at port 1/0/1 of cisco switch “cata_2960”

cisco_plugin.allow(5, “cata_2960”, “1/0/13”)

Out of these four total commands, two of them are for placing/removing a port in a

VLAN, and the other two are for creating(‘setup’) or removing(‘cleanup’) the VLAN.

The need for these extra two commands exists because most switches need some

setup/cleanup operations when a new VLAN is introduced. This is needed both in the

case of hardware switches (‘create vlan 5’ is a common CLI command that is needed

before ports can start to be added to the vlan) and of Linux bridges (to create and

remove bridges). More about specific plugin implementations can be read in section

6.2.

These commands are relevant in the context of 802.1Q VLANs, but they may need to

be changed to support different isolation mechanisms, depending on how different the

configuration of the other isolation mechanism is (Q-in-Q appears to be pretty

similar). Therefore, each plugin has a list of network isolation mechanisms it supports

associated with a list of commands.

The plugins’ classes are shown in Figure 36. The plugins can be made more modular

with the inclusion of drivers, which deal with the communication method to the

device. Each plugin has a set of drivers registered to it, which it can instantiate. The

chosen communication method for the implemented plugins was CLI over SSH, but it

is possible to use a different communication method and/or protocol, such as CLI

over Telnet, NETCONF, SNMP, OpenFlow or other. Each network device has one

DevicePlugin that uses one DeviceDriver which communicates the commands. Such

drivers can become very device- and protocol-specific. Changing the communication

protocol might even mean that the command themselves have to be different,

essentially requiring a new plugin. Therefore, exploring further design abstractions

related to the drivers was not deemed useful.

Each network node must have a device plugin associated with it. For ease of use, the

plugins can be instantiated by the plugins module, which acts as a façade [19, pp.

174-183] for all the available plugins (shown in Figure 37).


As a proof-of-concept, the NRS prototype has two implemented hardware switch

plugins (one for Brocade fastIron switch family and another for 3com 4210 model)

and two software switch plugins (Linux bridge and Open vSwitch) All of them use

SSH drivers. You can read more on the implementation and deployment of the

plugins in sections 5.2 and 6.2.

Figure 36: Device Plugins class diagram

Figure 37: Device Plugin instantiation


4.8 Request Manager

The request manager is responsible for receiving requests to connect or disconnect

interfaces, and uses all modules described in the previous sections to realize these

requests. It makes the appropriate calls to the network isolation module, the topology

and the algorithms to perform its operation.

A request to connect an interface to a network is implemented in two distinct steps, a

reservation and an allocation step. The two steps are different in nature: The

reservation is a logical operation on the topology; it reserves new nodes that are

needed for the request and guarantees that they are available to the requester; it must

be an atomic and fast operation on the logical topology, but performs no actions on

the devices. The allocation on the other hand is performed on already reserved

resources; the configuration of each device that corresponds to the reserved nodes

takes more time than the logical topology operation, but the allocation does not use

the logical topology and does not need a lock on it. Apart from this distinction,

having separate reservation and allocation steps can be useful to a cloud service that

performs these steps at different times (for example, the reservation step could be

done in the VM scheduling phase, while the allocation can happen later when VMs

are actually launched. These two moments could be fairly distant from each other,

depending on how the cloud platform is implemented and on the load that its service

is experiencing).

A request to remove an interface from a network is implemented in two steps as well,

but the operations are not provided separately to the interface. The process is

described in the following paragraphs.

4.8.1 Reservation

The reserve operation first calls the growing algorithm to add the requested network

interface to the network, and then reserves resources. The operations steps are

outlined below (also seen as a sequence diagram in Figure 38):

reserve(network_id, network_interface)

1. Retrieve the isolation_type used and the isolation_id for the network from

the NetworkIsolationManager

2. Call the growing algorithm to find the path with the new ports to be added to

the network

3. If the network is new, ask the NetworkIsolationManager to associate an

isolation id with the new network

4. Reserve the ports of the path to the target network. The port given in the

reserve call is marked as “explicitly reserved” and the rest of the ports as

“implicitly reserved”

5. Increase the ‘l2_networks’ and ‘isolation_type’ resource usage on the nodes

that own the path’s ports

6. Create and return a new reservation

When a reserve operation is successful, it guarantees that the network resources

reserved will be available for allocation. The outcome of the reserve operation needs

to be allocated so that the connections actually take place. The information that is

needed for this allocation is encapsulated in a Reservation object, which is used in the

allocation step. The class diagram is shown in Figure 39.


Apart from adding ports to the network, the reserve operation needs to take into

account a few additional pieces of information that concern network nodes. One is the

maximum number of concurrent VLAN ids that a network node can support

(requirement FR3). There needs to be a way to count every VLAN id added to the

node, so that if the maximum number is reached the node becomes unavailable for

further reservations. The second is the fact that every node needs to perform some

VLAN setup and cleanup actions (as shown in section 4.7) the first time a VLAN id

is added and when it is removed from the node.

This information is modeled as network ‘resources’ maintained in the nodes. Every

node has a resources dictionary that maps a unique string that identifies a resource to

an arbitrary value or object that quantifies it. In this case, the two resources are the

‘isolation_type’ and ‘l2_networks’, both of which use simple counters. The

‘isolation_type’ counts different isolation ids that exist on the node for the chosen

isolation type, thus counting the number of VLAN ids when 802.1Q VLANs are

chosen. When the node reaches the maximum, the node marks itself as unreachable,

which is used to temporarily exclude it from the topology graph in step 1 of the

growing algorithm (section 4.6.1). The ‘l2_networks’ counts the number of ports for

each different network that the node belongs to. When the port count first starts, it

marks the ‘setup’ plugin command to be called in the next allocation, and when the

count reaches zero the ‘cleanup’ command is marked.

The resources concept is meant to be used for any values or information related to

network nodes that have reservation semantics and need to be taken into account

when the reservation operation takes place. It is a rather simple concept; it needs to be

extended if the ability to reserve bandwidth is introduced to the system (also see

Chapter 8).

Each reserve operation calls the growing algorithm and increases the reserved ports

of the network. If multiple reservations are made, each reservation “builds” on top of

Figure 38: Reservation sequence diagram


the previous one. If these reservations are not allocated in the order they were made,

the allocated network can end up inconsistent. Therefore, if there is a pending

(unallocated) reservation for a network, a new reservation will require the first one to

be allocated before it can be allocated itself. This creates dependency links between

pending reservations for the same network, with each one depending on its previous

one. This dependency is modeled in the ReservationManager, an object that

maintains the reservations in a queue, one for each network, and allows only the first

reservation in the queue to be allocated.

4.8.2 Allocation

The allocate operation configures network devices to allow traffic from L2 networks

through specified ports. The operation iterates on the ports that belong to a

reservation, and invokes plugin commands as required to configure the network

devices. The allocation steps are the following (also seen as a sequence diagram in

Figure 40).

allocate(reservation_id)

1. Retrieve the reservation and the the isolation id associated with the network

contained in the reservation

2. Sort all ports contained in the reservation according to their owners’

allocation priority

3. Iterate on the sorted list of ports to call their owner’s device plugin’s ‘allow’

command.

The allocation priority is an integer associated with each network node. When

compared with a second node, this number specifies which node will be allocated

first (the lower the number, the highest priority). This is introduced to aid with

situations where certain network nodes need to be configured in a specific order in

order for the overall configuration to be successful.

Figure 39: Request manager class diagram


After an allocation is complete, the Reservation object has been deleted and the next

Reservation for this network (if any) is available to be allocated.

4.8.3 Release

When a port is requested to be removed from a network, first the shrinking algorithm

is used to determine how the network will change and collect the ports that can be

garbage collected. Then these ports are de-allocated to undo any configuration that

allows traffic from the network to reach the ports. The steps of the operation are

shown below (a sequence diagram is shown in Figure 41).

release(network_id, network_interface)

1. Call the shrinking algorithm to collect any ‘implicit’ member ports of the

network that can be garbage-collected.

2. Iterate on the ports to invoke the device plugins’ ‘disallow’ command.

3. Remove the garbage ports from the network.

4. Reduce the ‘l2_networks’ and ‘isolation_type’ resource usage of the port’s

nodes

5. Check if the network is empty. If it is, release its VLAN id to be usable by

new networks.

Figure 40: Allocation sequence diagram


4.8.4 Intra-cloud functionality

The reserve, allocate and release operations described in the previous sections

conclude the functionality of the request manager. All the modules described until

this section are used in the context of these three operations. In turn, the request

manager’s functionality implements the functionality that the NRS system offers in

network connectivity within one cluster, also referred to as the intra-cloud

functionality, and related to requirements FR1-FR5. The interface operations from

sections 4.2.1 and 4.2.2 are realized with calls to the request manager. Figure 42 has a

sequence diagram that shows how the connect operation uses the request manager.

This section concludes the intra-cloud functionality. The following sections describe

the support for connectivity to external networks, and the administrator access

interface.

4.9 External networks and inter-cloud

Moving outside of the local cloud site, there are two use-cases that connect local L2

networks to external ones. One is connecting to an external network which is ‘out

there’ and beyond local administrative control; the Internet fits this description. The

second one is bridging a local L2 network with a remote one belonging to a specific

remote site, over an external network. This inter-cloud connection is fully controlled

by the two sides that create it.

Figure 41: Release sequence diagram

Figure 42: Connect sequence diagram


From a local point of view, in both cases local L2 networks need to be connected to

external networks; the local end-point of an inter-cloud connection is a connection to

an external network as well. The NRS system requires each of these external

networks to be associated with a special gateway that is able to provide connectivity

to the external network. Connecting a local L2 network to an external network is then

reduced to first internally connecting the L2 network to the gateway, and then

configuring the gateway appropriately. Inter-cloud connections use the gateway

concept as well. Each type of inter-cloud connection needs a local gateway that is

able to set up and tear down the connection; the L2 networks will connect to that

gateway.

4.9.1 Gateway nodes

A gateway to an external network is just another network node contained in the

logical network topology; it can be any of the three node types (switch, host, or a

static virtual machine) that is best suited to represent it. Each gateway has a unique

name that identifies it; the topology maintains an association of the gateway names

with the ports of the gateway network node objects; each gateway is accessible

through a port.

The sequence of actions to connect a network to a gateway re-uses the mechanisms of

reservation and allocation that are used to connect any regular node to a network.

When a request such as

connect_gateway(2, “Internet”) #connect network 2 to the internet

is received, first the port of the ‘Internet’ gateway node is retrieved from the

topology, and then passed to a reserve operation to the request manager, followed by

an allocation. The reserve operation reserves a path from the gateway to the existing

L2 network, in exactly the same way as ports of regular nodes are added to a network.

The allocation corresponds to the gateway-specific configuration that needs to be

performed to connect the L2 network to the Internet (for example, modifying firewall

rules on the gateway). Gateway-specific plugins need to be associated with the

gateway nodes that can perform the configuration.

A sequence diagram of connecting a network to a gateway is show in Figure 43. The

operation fits well with the intra-cloud functionality of the system and is performed

with few actions, re-using the existing modules’ operations. This functionality

satisfies requirement FR6.

Figure 43: Connecting to a gateway sequence diagram


4.9.2 Inter-cloud connection

An inter-cloud connection’s purpose is to bridge two L2 Networks that reside on

(possibly) remote cloud sites, each of them managed by its own NRS service. Unlike

the functionality that has been described so far, which can be dealt with locally, an

inter-cloud connection is jointly created by two remote NRS services. An inter-cloud

connection request starts with one NRS service making a connection request to a

remote NRS. A negotiation ensues that determines whether there is an agreement on

the specifics of the requested connection. There are two things that need to be

ensured: 1) both NRS services need to support the same type of inter-cloud

connection (e.g., both need to have VPN gateways configured) and 2) both sites need

to allow the requested L2 networks to be connected over an inter-cloud. The last

point implies administrative control over which local networks can be used in inter-

cloud connections. After the negotiation is complete, some information may have to

be exchanged that is specific to the connection type (e.g., VPN configuration details).

The inter-cloud negotiation is implemented using message passing. A complete

negotiation is composed of a sequence of different message-exchanging operations,

each of them dealing with exchange of a specific piece of information. Each such

operation has two counterparts: One corresponds to the side of the initiator (or client)

and the other to the side of the receiver (or server). The message passing operations

have transactional semantics, i.e. information is exchanged, each side checks the

received information against internal constraints, and verification of the outcome is

exchanged. Both sides must complete their parts successfully for the negotiation to

have an effect on either side. The classes that represent these concepts are shown in

the class diagram of Figure 44.

In the upper part of the diagram of Figure 44 lie the classes that compose the building

blocks of the negotiation and deal with the implementation of message passing. The

NrsNegotiation class represents a complete negotiation. Its handle method is an

abstract method that needs to be implemented to perform the implementation-specific

exchange of information. The method returning successfully means that the

negotiation was successful. NrsNegotiation provides two methods for sending and

receiving messages. These methods are provided to NrsNegotiation by the

MessagePassingImpl interface, which must deal with the implementation of the

message exchange. The implementation for the NRS prototype is the NrsSocket,

which wraps over a low level TCP socket to synchronously exchange messages.

Subclasses of NrsNegotiation must be created to perform concrete negotiations.

Since message passing is not a symmetric relationship, but one side needs to start a

conversation (client) and the other to receive it (server), at least two classes need to

be created that implement the client’s and the server’s conversational behavior. These

subclasses need to implement their own conversational protocol.


The custom conversational protocol created for the inter-cloud negotiation is simple

and to the point. There are two different conversations that the initiator of the

conversation (the client) may start, which correspond to requesting a new inter-cloud

connection, or stopping an existing one. These are represented by the

ClientStartInterCloud and ClientCancelInterCloud classes. The first message

sent by the initiator must be the identifier of the imminent negotiation, which the

receiver must recognize before the chosen negotiation proceeds. The receiver of the

conversation (the server) needs to start a negotiation before knowing which

conversation will take place. This is modeled as a ServerNegotiation class that has

different states, with the current state determining the conversation taking place. This

behavior is known as the State design pattern [19, pp. 305-314], which allows an

object to partially change its state at runtime. The ServerNegotiation class has a list

of two possible states, the start state and the cancel state, that correspond to the

ServerStartInterCloud and ServerCancelInterCloud classes. These two classes

implement the two different conversations from the side of the server. The

ServerNegotiation first receives the negotiation identifier, and then sets the state

accordingly and proceeds with the chosen negotiation. The actual information

exchange that takes place in the two different negotiations between client and server

is shown in Table 8 and Table 9.

Figure 44: Inter-cloud negotiation class diagram


Table 8: Starting an inter-cloud negotiation

Information context Client actions Server actions

Negotiation identifier send negotiation identifier for

StartInterCloud

receive negotiation identifier

receive confirmation/rejection send confirmation/rejection

Connection type (e.g.,

OpenVPN)

send connection type identifier receive connection type

identifier

check if connection type is

supported


L2 Network identifiers send local network id receive remote network id

send remote network id receive local network id

check if local network id is

allowed for inter-cloud


Connection specific

information (for

OpenVPN)

VPN role (server or

client)

send VPN server preferences receive client’s VPN

preferences

decide on who becomes the

VPN server

receive VPN role send client’s VPN role

check if VPN role assigned

agrees with VPN settings

send confirmation/rejection receive confirmation/rejection

VPN configuration send local VPN configuration receive remote VPN

configuration

receive remote VPN

configuration

send local VPN configuration

Table 9: Cancelling an inter-cloud negotiation

Information context Client actions Server actions

Negotiation identifier send negotiation identifier for

CancelInterCloud

receive negotiation identifier

receive

confirmation/rejection

send confirmation/rejection

Identifier of the inter-

cloud connection to be

cancelled

send local network id receive remote network id

send remote network id receive local network id


The columns of actions in Table 8 and 9 are sequences of actions; the client (or the

server) execute them sequentially from top to bottom. The message passing is

synchronous and the client and server always synchronize at the send/receive action

pairs. Thus, server and client actions that are in the same row can be considered to

happen simultaneously. At the confirmation/rejection actions, if the outcome is a

rejection both sides will finish the negotiation with a failure indication. The actions

for each different information context are grouped in the operations of the negotiation

classes in Figure 44.

There are various configuration options that are associated with each side of the

negotiation, such as connection-specific preferences and available connection types.

These are passed to the negotiation classes when they are instantiated, which happens

when an inter-cloud request is initiated towards a remote NRS service, or when a

request is arriving from a remote service. The NRS object is the one that instantiates

the negotiation objects to start or stop inter-cloud connections, and maintains the

configuration settings. The negotiations are instantiated after a request such as

start_inter_cloud(2, 3, nrs2.nikhef.nl, 10003, ‘OpenVPN’)

#bridge local network 2 with remote network 3 located at the

#service at nrs2.nikhef.nl:10003. Connection type OpenVPN.

NRS also maintains a list of existing inter-cloud connections. When a request to

cancel an inter-cloud comes in, NRS checks the information against the list of

existing inter-cloud connections to identify which connection to tear down.

Once a Negotiation is complete, all is left to do is to connect the local network to the

inter-cloud gateway, as described in section 4.9.1. The configuration of the gateway

will be performed by the gateway’s plugin, afther which the inter-cloud connection

will be ‘live’. A sequence diagram showing the whole process of the creation of an

inter-cloud connection is shown in Figure 45.

The NRS instance on the left side of Figure 45 represents the ‘local’ NRS service,

while the one on the right side is the ‘remote’ service. It is worth noting that the only

expected functionality from the remote service is to converse correctly during the

negotiation. The manner by which the inter-cloud connection is actually realized on

the remote site is of no concern to the local NRS, as long as it works for the end-user

of course. This means that, if NRS is stripped of all functionality that is not necessary

Figure 45: Inter-cloud sequence diagram


for inter-cloud negotiations, it can be used as an ‘inter-cloud only mode’ that can

properly converse with any remote (full-fledged or not) NRS service.

4.9.3 Inter-cloud mode NRS

NRS can be configured to run in an ‘inter-cloud’ only mode (requirement FR8). In

this mode, the NRS service can perform only the inter-cloud start and stop family of

operations (from section 4.2.3), putting aside all other functionality related to internal

reservations and allocations. The inter-cloud operations themselves perform only the

inter-cloud negotiation part, without calling the connect_gateway operation (section

4.9.1). The cloud site running the ‘inter-cloud’ mode NRS needs to make use of the

information returned from the inter-cloud negotiation. This information is required to

perform the local configuration of creating an inter-cloud endpoint and connect the

local networks to it; the intra-cloud NRS functionality that can do that is missing. As

an aid to a site choosing ‘inter-cloud only mode’, NRS provides the optional

allocate_gateway and deallocate_gateway operations. These skip the normal

reservation and allocation steps done by connect_gateway (section 4.9.1), and

directly configure a gateway with the passed information. These optional operations

directly invoke the gateway’s device plugin to perform configuration, so that the site

using ‘inter-cloud only mode’ can choose to re-use the gateway plugin to perform

configuration exactly as performed in a full-fledged NRS service.

As seen in the inter-cloud information exchange and with ‘inter-cloud’ mode, there

are plenty of configuration options related to inter-cloud connections. These are

further discussed in Chapter 6. In addition, every NRS service needs to run an inter-

cloud server that listens for inter-cloud requests. More about implementation details

can be found in Chapter 5.

4.10 Administrator Access

The network topology on which NRS operates needs to be inserted in the NRS

somehow. Apart from being inserted, all parts of the topology need to be modifiable

at runtime by an administrator. Based on the topology model created for the NRS

prototype, the modifications that an administrator would do can be grouped in two

families of actions:

1) Modifications on static nodes of the topology: These are modifications such

as creating/deleting a node, adding a port to a node, or connecting and

disconnecting ports of different nodes.

2) Modifications on the L2 networks: An administrator may want to manually

change the port membership of an L2 network. A motivation for this can be

a need to re-arrange the implicit ports that belong to the network, so that the

network traffic no longer goes through a specific node and/or explicitly goes

through a desired node.

Such modifications may conflict with ongoing network operation. We can consider a

snapshot of the NRS topology while the system has been running for some time:

Typically, several L2 networks will have been formed on top of the topology, each of

them having a set of ports belonging to it, possibly overlapping with the ports of the

other L2 networks. If the administrator modifies one of these ports, either by

removing the port or disconnecting two nodes from each other, or if the administrator

adds new ports to a network that are not connected to its old ones, one or more

networks can be put in an inconsistent state (as defined in section 4.6.4). Therefore,

any modifications done must make sure that all networks remain consistent.

Thus, two things are required for administrative access: Firstly, a way to access and

modify the topology that is suitable for a human, and secondly, mechanisms that

make sure that the topology is kept consistent.


4.10.1 Topology access

Initially, a structured textual representation (XML or equivalent) of the network

topology was considered. The administrator would be able to create it and insert it to

the NRS service, or retrieve it from NRS while it is running. Such a feature would

serve two purposes:

1) a human-readable and editable topology

2) a way for the system to save its state offline

This would also lead to two different representations of the topology model: 1) the

software model, as kept in the system’s memory, and 2) the schema that models the

topology in the textual format (the XML schema). The topology model however was

not set in stone for the duration of the project; for the most part of it, it was being

modified as part of exploring different modeling options and design decisions. Using

both the software model and an XML schema would lead to maintenance overhead

not acceptable for the NRS prototype development; not only would the two

implementations of the model need to be modified, but also the parser that would

convert from one to the other.

There is an additional complication in modifying the topology. When an

administrator needs to modify it while the system is running, the system needs to stop

serving requests, or at least requests related to some parts of the topology. For

instance, if the administrator is changing a specific L2 Network, the system cannot be

growing or shrinking it in the background; the network needs to remain unchanged

from the moment the administrator receives the topology in human-readable form,

until he or she submits a modified one. That means that NRS should stop serving

requests, at least the ones concerning the specific network. This introduces the

concept of the topology being a shared resource or containing multiple shared

resources, contested between the administrative access and the NRS service.

An XML schema is not well suited towards such functions, but more towards

‘offline’ viewing. Trying to use it in the scenario described above would require an

additional facility or tool that the administrator would use to notify which parts of the

topology he will change. This tool then would have to inform NRS on which parts of

the topology cannot be modified; the tool would have to be part of NRS itself or

perform inter-process communication to it. From a prototype’s perspective, it would

be much more convenient if the administrator can have direct access to a command

line tool that has access to the topology object as it is kept in memory, and can

present the administrator with options to modify it. If this tool is an extension of the

NRS service, access to shared resources is easily resolved.

This was the chosen mechanism for the prototype; NRS provides a CLI environment

that presents its user with the topology and options to modify it. The CLI tool also fits

well with the common practice of network devices and services to provide an

administrator with a CLI environment to make modifications.

Apart from the topology, the administrator is interested in changing the network

isolation manager at runtime as well. For example, he may need to mark a new set of

802.1Q VLAN ids as unusable. The network isolation manager object is provided to

the administrator through the same CLI tool.

4.10.2 Topology insertion

Apart from the administrator access tool, the topology may need to inserted or

modified by an automatic network discovery tool or facility. Such a tool was out of

scope for the NRS project, inserting the topology however should not be coupled to

the administrator access tool.

The object that operates on the topology is the request manager, and this is where the

topology modifications are applied. The request manager provides an operation to

replace its topology object with a new one. Before the new topology is accepted, all

modified L2 networks are checked for consistency. The consistency checks guarantee


consistency of networks at a logical level of the topology (or as logical reservations).

However, the newly added or removed ports of the networks need to be allocated as

well, i.e. plugin calls need to be performed that will connect them to the L2 networks.

After the allocation step is complete, the new topology replaces the old one. The class

methods that perform these operations are shown in Figure 46.

4.10.3 Administrator CLI

Command line interfaces can be very elaborate; they provide their own custom

commands and functionality concepts, and issues such as usability and

comprehensiveness are important for a successful CLI. Creating an elaborate CLI tool

was not a goal of the NRS prototype. Its CLI tool is simply a means to achieve the

desired functionality of providing real-time access to the NRS topology.

The NRS system’s administrator CLI is based on the Python interactive interpreter;

an interactive prompt that accepts any Python code, prints expression results and

stores all variables that the user creates for the duration of the session. The NRS CLI

basic concept is that it provides the exact Python API that is used to create the

topology object internally. Thus, the administrator modifies the object in the same

way that the developer creates it programmatically. Using the Python standard

library, such a solution was very easy to create.

The CLI is based on the InteractiveConsole class, provided by the Python library’s

code module. This class directly provides the Python interpreter functionality; the

interpreter’s environment, i.e. the variables and functions that are available to the

user, can be modified before the interpreter is started. This allows the topology object

to be passed to the interpreter’s session, along with any other functions needed to

support the CLI’s functionality. The user can then call the available functions and

object method’s directly. Therefore, passing the topology object to the interpreter

makes all of its methods available to the interpreter, which include adding and

modifying nodes and ports.

The CLI uses the concept of committing modifications, inspired by Juniper’s Junos

Figure 46: Operations to modify the topology


OS CLI47

. The user is free to make any modifications, but none of them are applied

until an explicit commit command is submitted. It is then that the modifications are

checked for consistency, and applied to the running system. This is why the topology

object passed to the CLI session is actually a copy of the topology object, and not the

original one used by the NRS service. The copy can be used by the CLI user to view

or modify, while the NRS service is simultaneously accepting requests on the real

topology object. If the administrator makes changes to the static topology, then the

service can keep accepting requests, since the user requests are not capable of making

changes to the static topology. If the administrator however needs to change the port

membership of an L2 network, then the service must stop until the administrator

commits his changes. This is signified by a special CLI command that needs to be

invoked, the edit_network. The handling of resource locking and the concurrent

tasks of the NRS system are discussed in Chapter 5.

The CLI is available to access through a Telnet server, a network protocol that

provides an interactive CLI facility. Telnet clients exist for almost every computing

platform. The server was created using the telnetsrvlib library, which is a third-

party Python library that implements a Telnet server. The library provides a

TelnetHandler class, which represents a single Telnet session that has methods to

handle received input. The class was extended to send the received input to the

interpreter class, receive the interpreter’s output and push it as a reply to the Telnet

client. The interpreter itself is implemented by extending the InteractiveConsole

class. The class provides a push_line function, which expects a string input,

evaluates it as a Python expression and then prints the outcome to the system’s

stdout. The class was extended to cache the output instead of printing it to stdout. The

cached output is used by the Telnet server as the reply. The class diagram of the

Telnet server and the interactive interpreter is shown in Figure 47.

Each Telnet session will instantiate an AdminShell, which deals with retrieving the

topology and network isolation manager, copying the objects and correctly

initializing the interpreter’s environment. It should be noted that the interpreter gives

the freedom to manipulate its objects in any possible way allowed by the Python

language. This can ruin the Telnet session, since someone can delete the topology

object for example. To alleviate this situation, a reset command is available that will

reload the interpreter’s environment with newly retrieved objects (although the reset

command can be ruined as well). Such improper usage will not affect the NRS

system, unless a commit with malformed arguments can pass the consistency checks.

In general, dealing with such behavior was out of scope for the project’s prototype.

Access to the CLI must to be strictly limited to the administrator. This is achieved by

configuration and deployment restrictions, explained in Chapter 6.

47 http://www.juniper.net/techpubs/en_US/junos11.1/information-products/pathway-

pages/junos-cli/junos-cli.html


Figure 47: Administrator CLI over Telnet class diagram


4.10.4 System state

Apart from letting the administrator view and modify the system’s topology, the

system needs a way to save its topology offline, so that it can boot or recover from it.

This does not pertain only to the topology, but to the overall system’s state. The NRS

system’s state consists of the state of its topology, its network isolation manager and

its reservation manager. These objects are saved by being serialized (using the Python

pickle module) and stored in files. The operation is shown in Figure 48. When the

system starts, it looks for the serialized objects to load, otherwise it starts with new

empty objects. An administrator also has the option to create and save these objects

‘offline’ using independent Python scripts.

4.11 Conclusion

This chapter presented the overall architecture of the system, the modules that

compose it, and the elements and interactions in each module. This is referred to as

the logical view of the system and conveys the system’s functionality and business

logic. Besides the logical view, a description of the system is needed that maps the

system’s elements to active tasks (threads) that share resources and communicate

with each other. In addition, the modules are grouped in components and code source

archives. These depictions of the system are described in Chapter 5.

Figure 48: NRS system state

Implementation 76

5 Implementation This chapter presents implementation details and describes the system’s components.

Quality attributes and source code analysis are provided as well. Moreover, this

chapter depicts the system’s concurrency and synchronization aspects.

5.1 Python

The programming language of choice was Python48

. Python is an interpreted

language that supports the imperative and object-oriented programming paradigms.

Python is dynamically typed and features automatic garbage-collection. Its use has

gained a lot of popularity during the last years, and it is being used extensively both

for scripting as well as for non-scripting purposes. Large projects have been using it

for production software, especially web applications (e.g., OpenStack, Google,

Dropbox, Reddit).

Python’s flexibility and dynamic nature makes it ideal for rapid prototyping. For the

needs of the NRS project, which glues together various different domains such as

network programming, scripts for controlling devices, and object-oriented design,

Python is a very good candidate. Its biggest asset is the standard library that comes

along with the language. ‘Batteries included’ is a proper depiction of the library’s

utilities. Various concepts of the prototype, such as the device plugins, the

administrator CLI, and the listening servers, were possible to implement in a short

period of time thanks to Python’s standard library.

5.2 System components

The system’s implementation consists of multiple libraries and extensions through

plugins. These belong to different components of the system, which are units of

replacement that compose the running system.

The system’s components are shown in Figure 49. The main component of the system

is NRS, which lies in the middle of the figure. It is the component whose

functionality was described extensively in Chapter 4. We can identify three Python

libraries that NRS is using directly:

networkX, the graph library

telnetsrvlib, the library that implements the Telnet server

Graphviz, the library that implements the graph visualization49

NRS also employs device plugins. At the allocation step, plugins are triggered to

perform device configuration. The plugins are required to implement the plugin

interface, but they are separate components can have any arbitrary form that suits the

specific device’s needs. For the needs of the NRS prototype, the implemented plugins

are scripted SSH sessions. They transfer and invoke the required CLI commands on

the devices using the pexpect Python library. This library provides the ability to

automate and control other programs, a functionality suited to automate SSH

sessions. The implemented plugins both for VM hosts and for switches use this

functionality.

48 http://www.python.org/ 49 Open-source graph visualization, http://www.graphviz.org/

Implementation 77

The NRS service is accessible through the TCPserver component. This component

listens for incoming connections and forwards the requests using the ServiceInterface

that is provided by NRS.

The client for the NRS service is implemented in the NRS CLI component. This tool

provides a command line interface for the user of the service, and provides various

command line arguments that correspond to the ServiceInterface operations. When

the tool is invoked, it translates its arguments into server requests and sends them to

the server. The communication with the main service is done through a custom

messaging protocol.

A few command line usage examples follow:

nrs connect --host NAME --interface NAME --network_id ID --vlan_id ID

nrs gateway disconnect --network_id ID --gateway NAME

nrs allocate --reservation_id ID

The CLI tool needs to be used by a cloud platform for NRS to be utilized by the

platform. In the case of OpenNebula, this is performed by the NRS OpenNebula

driver, which implements the OpenNebula virtual network driver functionality (as

described in section 2.4.1). The virtual network driver defines three actions that

perform network configuration related to one VM that changes states. The actions

are:

the pre-boot action, invoked before a VM is launched

the post-boot action, invoked after a VM finishes launching

the cleanup action, is invoked after a VM is shutdown or removed

These actions are implemented as scripts that receive all information related to the

VM as an argument. This includes the VM’s VIF and the id of the virtual network.

The NRS driver uses the post-boot and cleanup actions to call vif connect and vif

disconnect respectively, which result in the VM connected to the network as

Figure 49: NRS components

Implementation 78

described in the NRS functionality. The required arguments for these calls are

provided by OpenNebula. The NRS driver functionality is shown in Figure 50.

5.3 Code analysis

The NRS system implementation consists of source code artifacts. The project’s code

archive is similar in structure to the system’s modules as seen in the view of the

system’s architecture. A high-level view of the source code modules, as seen in the

project’s root directory, is shown in Figure 51.

The source code was analyzed using pylint50

, which is a tool for static analysis of

Python code. The Python community has specified a very strict coding standard51

,

and pylint can check Python code against the standard and grade the code based on

how well it adheres to it. This includes conventions such as variable and function

names, number of class methods, etc. The score related to the standard conventions is

the pylint convention score. Pylint can also check what may be programming

mistakes such as invoking objects that are not callable or assigning values from

functions that do not return a value. It can also check for indications for a need to

refactor code, mostly in the form of recognizing code duplication. These last two

50 http://www.logilab.org/857 51 PEP 8 -- Style Guide for Python Code, http://www.python.org/dev/peps/pep-0008/

Figure 50: NRS OpenNebula network driver

Figure 51: Overview of source packages

Implementation 79

properties are indicated by the pylint refactors/warnings score. These scores are

indications of the source code quality and in that regard are more important than the

convention score. The statistical data and results of the code analysis are shown in

Table 10.

Table 10: Source code analysis

Code Readability

2507 python statements 26% docstrings

17 packages 4% comments

63 *.py files pylint category pylint score

66 classes refactors/warnings 9.09/10

303 methods convention 7.1/10

The docstrings mentioned in Table 10 are comments that introduce a class or method

and explain its purpose and function (they are similar to Java’s javadoc). 30%

comments combined with a 7.1/10 standard adherence means that the source code is

quite readable by a third person. The refactors/warnings score of 9.09 has a bigger

significance; a low score would be an indication that something may be wrong in the

program.

Apart from readability and code duplication, an indication of the code’s quality is its

cyclomatic complexity. This is a software metric that measures the amount of

different paths in the control flow of a program. In general, a cyclomatic complexity

value below 10 is considered acceptable. Much higher than 10 indicates that the part

of the code with the high value may be possibly split into smaller modules or

functions to reduce its perceived complexity. The cyclomatic complexity of the

source code was measured with the pygenie52

tool, and the results are shown in Table

11. The tool measures the complexity of the system’s functions and class methods.

On the left side of the table is the complexity and on the right the amount of times it

occurs in the total of ~300 methods of the system.

Table 11: Cyclomatic complexity

Cyclomatic complexity Number of occurences

11 1

10 1

8 1

7 6

6 6

5 14

<= 4 rest (~270)

An aspect of code quality is its readability and complexity, which in turn determine

its maintainability. The metrics performed by the two tools mentioned in the previous

paragraphs show that the source code is in a good condition in this regard.

52 http://traceback.org/2008/03/31/measuring-cyclomatic-complexity-of-python-code/

Implementation 80

5.4 Concurrency

In the description of the system’s functionality, it was specified that some of the

system’s elements are concurrent tasks that share resources. In addition, the NRS

system has to expose its functionality to its users. This is done by providing a socket

server implementation that is listening for incoming requests; to handle them, the

server needs to forward requests to the NRS object in the form of the service

operation calls, and receive a reply. Multiple such requests may happen at the same

time. In particular, there are three high-level functionalities that have parallel

execution semantics.

1) The main service operations, which correspond to the service interface

operations presented in section 4.2. These are requests performed by the user

of the service and handling them is the main functionality of the system.

Most of the requests result in changes in the topology’s L2 networks.

Multiple users can make simultaneous requests to the service.

2) The administrator CLI tool. This tool is invoked by the administrator and

runs in parallel with the main service. The tool can perform changes to the

topology.

3) The inter-cloud service. This refers to the server listening to requests from

remote NRS systems. In a similar fashion with the main service, it may

receive multiple simultaneous requests. The communication through this

channel uses its own special conversational protocol as described in section

4.9.2.

The above points show that there is a need for at least three concurrent tasks that

access the NRS service. This means that the service can be heavily contested. All

possible simultaneous actions can end up modifying the system’s state: the topology,

the reservations and the network isolation information. The functionality of the

system however can guarantee its integrity only if changes are atomic. The class

relationships that show the resource congestion are shown in Figure 52.

There are various ways to solve this resource congestion. The simplest one is to

identify that all changes in the state of the system are initiated in the operations of one

object: the request manager’s reserve, allocate, release and replace_topology

operations. Making these operations atomic solves the resource sharing problem and

this is the solution chosen for the prototype.

The request manager’s operation atomicity is implemented by a Lock object, the

Python synchronization primitive.

Implementation 81

The atomicity of these operations makes all operations on the topology sequential.

This means that all requests wait for the previous one to be over; however the

prototype is not meant to be scalable in the context of serving multiple requests

simultaneously. Internally, it still has to process every request sequentially and

atomically, at least in the level of performing logical operations on the topology. The

ability to operate on the topology in parallel while ensuring consistency would

complicate the logic, was not relevant for the requirements, and would risk the

feasibility of the project.

Since these operations are atomic, there is no need to create elaborate server

implementations. A synchronous listening server that spawns no threads or processes

to handle requests is sufficient. Such a server processes all requests sequentially and

atomically, which is what the system that receives the requests also supports.

The functionality of the system is mapped to one process and three threads.

One process:

Thread 1: the ‘main’ thread running the server for the main service and the

inter-cloud server. These are multiplexed with the select() system call.

Thread 2: running the Telnet server for the administrator CLI

Thread 3: periodically saving the system’s state

All the threads are created when the system is initialized, and remain running until

the system is shut down. Their initialization is shown in Figure 53.

Figure 52: Sharing of request manager

Implementation 82

Figure 53: NRS initialization

Deployment 83

6 Deployment This chapter describes how the NRS prototype can be deployed within a cloud site,

and with which machines and devices it needs to communicate and how. It also

describes configuration options.

6.1 NRS cloud deployment

A cloud site consists of virtual machine hosts, switches, and some machines that run

the cloud software platform. NRS needs to communicate with the cloud platform to

receive requests, and with the hosts and switches to configure them. In addition, NRS

needs to communicate with remote NRS systems, as well as be accessible to an

administrator. This immediately places the NRS system in need to be connected to

various different networks:

1) the “management network”, which has connectivity to all hosts and switches

of the cloud site for control and monitoring purposes

2) a “public network”, which is an external network over which communication

with a remote NRS system is possible

3) a network where an administrator has easy access to

The services that NRS offers are related to these networks as well. NRS has up to

three configurable network addresses for each of its servers:

1) the “main service” address, through which NRS receives requests. This may

or may not be in the management network, but should be reachable from the

machine running the cloud platform

2) the “inter-cloud service” address, through which NRS communicates to

remove NRS systems to set up inter-cloud connections. This address is

facing the public network NRS is connected to.

3) the telnet service address for administrator access. This is configured to the

leisure of the administrator. It is important that access to it is not possible

outside the administrative network.

A deployment of NRS in a cloud site is shown in Figure 54. The figure shows

different machines with network interfaces attached to them. It is assumed that NRS

is running on a dedicated machine with a separate network interface for each network

it is connected to, although these are not necessary. The different networks within a

cloud site are shown with different colors in Figure 54. The production network,

which hosts the users’ networks traffic, is a separate domain from the management

network.

The NRS system options such as the server network addresses are configured in the

NRS configuration text file. A sample is shown in Figure 55, where you can see basic

configuration options and the system’s initialization. It can be seen that the three

system’s functionalities are optional and can be selectively turned on or off. The

functionalities are

1) intra-cloud, which includes all internal reservation logic and device

configuration

2) inter-cloud, which includes inter-cloud negotiation with a remote site and

configuration of inter-cloud gateways

3) telnet, which is the Telnet server

Each of the system’s functionalities has several related configuration options that

need to be filled in if the functionality has been enabled. All functionalities being

enabled corresponds to a deployment with functionality similar to Figure 54.

Deployment 84

6.2 Device plugins

Besides the NRS system, the device plugins may have their own deployment and

configuration aspects. These are specific to each device plugin. The plugins

developed for the prototype are SSH sessions. Therefore, the configuration required

was that the devices need to be pre-configured to allow SSH connections with

administrative rights.

There were four device plugins implemented and deployed for the NRS prototype,

two for physical switches and two for virtual switches. These configure the following

devices: the 3com 4210 switch, the brocade fastIron switch series, hosts with Linux

bridge, and hosts with Open vSwitch.

The switch plugins translate the plugin commands into CLI commands that are

specific for the switch’s CLI, and send them to the device over SSH. The calls

Figure 54: NRS deployment in a cloud site

Figure 55: NRS basic configuration

Deployment 85

manipulate trunk ports. For example, a plugin call such as

allow(host=”3com4210”, port=”1/0/2”, vlan_id=5)

is converted to the following CLI commands, as seen in the 3com’s telnet interface

(including the prompt):

<4210> system-view

[4210] interface Ethernet 1/0/2

[4210-Ethernet1/0/2] port link-type trunk

[4210-Ethernet1/0/2] port trunk permit vlan 5

The host plugins operate in a similar fashion to send commands that manipulate the

virtual switch. The end-result is similar to how cloud platforms use them, as

described in section 2.4.1 and shown in Figure 10 and Figure 11. The difference is

that the VM gateway is not trunked to allow all possible VLANs, but only the ones

that dynamically have been configured by NRS. The Open vSwitch deployment in a

host that needs VMs connected to two networks with VLAN ids 6 and 15 is shown in

Figure 56.

6.3 Inter-cloud with OpenVPN

The implemented inter-cloud connection that was chosen for the NRS prototype uses

OpenVPN. The gateway for the OpenVPN service is a machine with the OpenVPN

software installed that can launch OpenVPN processes. This VPN gateway

(informally, the VPN box) needs to be connected to the external network over which

VPN connections will be created.

The two NRS systems also need to be able to talk to each other over an external

network, so that the inter-cloud negotiation can take place. For it to finish

successfully, both services need to agree that they provide the same inter-cloud

capabilities, as determined in their configurations. After the negotiation is complete,

each service needs to configure the VPN gateway. This is performed with calls to the

device plugin that is associated with the gateway. The VPN configuration is

exchanged during the inter-cloud negotiation.

The deployment of the OpenVPN inter-cloud connection is shown in Figure 57. The

plugin’s functionality for a single inter-cloud connection consists of the following

sequence:

1) launch an OpenVPN process with L2 networking, which creates a tap

network interface in the VPN gateway. The tap interface is the inter-cloud

Figure 56: NRS Open vSwitch plugin operation

Deployment 86

connection end-point for each side.

2) bridge a local L2 Network to the tap network interface, which allows the L2

traffic to flow through, after which the inter-cloud connection is live and the

two networks are bridged.

Every L2 Network is associated with a bridge in the VPN gateway, so that it can be

connected to multiple tap interfaces which correspond to multiple remote L2

networks. In addition, the VPN gateway can host multiple local L2 networks and

multiple OpenVPN processes.

The overall deployment for each cloud site is left to the local administrator. The NRS

system of site #1 only knows that the traffic coming through each VPN connection is

part of the private network. On the side of site #2, it is up to the local administrator to

expose the desired private network to the VPN box (either with the use of intra-cloud

NRS or with a method of his choice). The manner that this is done is of no concern to

site #1; the only thing required is that the private network in site #2 is bridged with

the tap interfaces of VPN gateway #2.

Inter-cloud configuration

The inter-cloud functionality has additional configuration options, show in Figure 58.

These include connection-specific details and limitations on which networks can be

used for inter-cloud. The administrator can specify which local L2 networks are

allowed to take part in inter-cloud connections. In addition, SSL certificate and key

files can be specified to be used in the connection to remote NRS systems in order to

secure the communcation.

Figure 57: NRS inter-cloud with OpenVPN

Deployment 87

6.4 Deployment at the Nikhef cluster

The NRS system was deployed in a small private cloud site at Nikhef’s data-center.

The machinery that comprised the cloud site was provided by Nikhef for the needs of

the project. The cloud platform of choice was OpenNebula and its installation and

deployment was performed in the first months of the project. The small cloud site

was used to continuously integrate the software product with the actual cloud

deployment.

The actual deployment is shown in Figure 59; the ‘local’ cloud site consisted of three

virtual machine hosts, three switches and one machine acting both as the OpenNebula

frontend and as the host machine of the NRS service. In addition, a static virtual

machine was used as the OpenVPN gateway for the OpenVPN inter-cloud

implementation.

The ‘remote’ cloud site for the inter-cloud was deployed as a single host that fulfilled

three functionalities at the same time:

1) Run NRS in ‘inter-cloud only mode’

2) Run the KVM hypervisor

3) Was the OpenVPN gateway

This was sufficient to deploy the inter-cloud connection of VMs over OpenVPN, as

shown in the inter-cloud deployment of Figure 57.

If we compare the deployment of the system with the one described in section 6.1

(Figure 54), we can see that the prototype deployment has various machines and

networks overlapping. For example, there is one machine hosting both NRS and the

cloud platform, and the ‘external’ network is actually part of the same cluster. This

deployment is fine with regard to testing and verifying the prototype. In general, the

system’s deployment can be configured extensively; Figure 54 shows the most

elaborate deployment allowed by the system.

Figure 58: NRS inter-cloud configuration options

Deployment 88

Part of the topology of Figure 59 corresponds to the logical topology maintained and

controlled by the full-fledged NRS system that resides in the ‘OpenNebula frontend’

host. A snapshot of the graph of that topology is shown in Figure 60. This graph is

exactly how the topology is kept in the NRS system and visualized with the Graphviz

tool. Apart from the static parts, we can identify that the topology of Figure 60 also

contains parts that have been dynamically added to it at runtime: three virtual

machines that have been launched by OpenNebula, and a logical L2 network formed

between them.

The hosts, switches, and the OpenVPN gateway use the device plugins that have been

described in sections 6.2 and 6.3.

Figure 59: Private cloud deployment at Nikhef

Deployment 89

Figure 60:Topology snapshot of the deployed NRS system

Verification and Validation 90

7 Verification and Validation This chapter describes to which extent the system requirements are fulfilled by the

NRS system’s functionality. It also presents the testing procedures used to determine

the correctness of the produced prototype.

7.1 Functional Validation

The validation of the system’s functionality, i.e., determining if the functionality

meets the system requirements, was performed with acceptance tests. These tests

dictate certain user input and determine how the system functions compared to how it

is expected to function. All the tests were performed in the local deployment of the

system at the Nikhef cluster, as described in section 6.4. The tests are applied to a

high-level view of the system; it is the point of view of the user and of the

functionality he expects the system to exhibit. The tests presented in this section are

not exhaustive, but an indication of how the system’s functionality is verified.

The acceptance tests were performed manually; automating such tests is beneficial

but due to the nature of the system (integrating lots of different components, from

cloud platforms to network switches) automating the tests for the NRS system would

have required a lot of development effort.

FR1: Network connectivity

The first requirement states that the NRS system should provide network connectivity

among a set of network interfaces. Network connectivity refers to the network

interfaces able to reach each other over the network. This can be easily confirmed if

the network interfaces are given IP addresses that belong to the same IP subnet. Then,

the ping tool can be used by the machines that own the network interfaces to ping

each other’s IP address. The test is shown in Table 12.

Table 12: Network connectivity tests

Summary

Verify that network interfaces ‘connected’ with the NRS connect operation can reach

each other, i.e. are actually connected to each other.

Preconditions

1. At least two virtual machines should have been launched in the same host,

and at least two virtual machines should be in different hosts that are

connected through one or more switches.

2. The virtual machines’ interfaces must have IP addresses in the same subnet

3. NRS must be running and the NRS topology must contain the devices that

will be used for the test.

Steps Expected results

1. Pick an unused L2 network id and call

vif connect with all the VM’s

interfaces and the network id as

arguments

2. Log in each virtual machine and ping

all others

1. The pings should return ping

responses.


FR2: Isolation of L2 networks

This requirement refers to the fact that L2 networks formed by NRS must be isolated

from each other and from any other networks that may exist in the network

infrastructure. This can be verified with the use of tcpdump53

, a tool that can monitor

a network interface for IP packets that arrive to the interface. The tool can be used to

show that packets that belong to a L2 network never reach a network interface that

belongs to a different L2 network, and vice versa. This can be verified with

monitoring of packets that belong to protocols that send broadcast traffic, such as

ARP. A broadcast will always arrive to all reachable destinations, i.e., if a broadcast

packet does not reach a network interface, the broadcast packet and the interface

belong to separate L2 networks.

The implementation of network isolation depends on the isolation method chosen and

the devices that implement it. In the case of 802.1Q VLANs, isolation is enforced by

physical and virtual switches, and NRS uses these as specified in their usage manuals.

Essentially, testing whether the networks are isolated tests

1) whether the physical and virtual switches correctly implement VLAN

separation and

2) whether the NRS system integrates them correctly

The acceptance test of FR2 is shown in Table 13.

Table 13: Network isolation test

Summary

Verify that traffic from different L2 networks is separated from each other.

Preconditions

1. NRS must be running and the NRS topology must contain at least two

different L2 networks (networks A and B).

2. The devices that belong to the L2 networks need the tcpdump utility.


1. Start tcpdump on all network

interfaces that belong to the L2

networks and listen for ARP packets

(i.e., tcpdump -i eth0 arp)

2. Perform a ping among machines that

belong to network A.

3. Observe the tcpdump output.

4. Stop the ping.

5. Perform a ping among machines in a

network external to the L2 networks

formed by NRS, i.e. ping among IP

addresses that belong to the existing

management network between the

machines.

6. Observe the tcpdump output.

1. First tcpdump output:

i. All machines within network A

should have received an ARP

packet looking for the IP address

of the ping recipient (Who has xxx.xxx.xxx.xxx? Tell

xxx.xxx.xxx.xxx)

ii. All machines within L2 network

B should have received nothing.

2. Second tcpdump output:

Both machines within networks A and

B should have received nothing.

53 http://www.tcpdump.org


FR3: VLAN id restrictions on devices

This requirement refers to the system’s ability to take into account VLAN id

limitations in network devices, so that it can correctly map connection requests to the

logical topology based on whether the devices can support it. If an L2 network is

associated with a VLAN id that a device does not support, the device should not be

possible to add to the L2 network. The acceptance test is shown in Table 14

Table 14: VLAN device restrictions test

Summary

Verify that device VLAN limitations are taken into account by the system when

connecting network interfaces to L2 networks.

Preconditions

1. NRS must be running with 802.1Q as the network isolation method.

2. The NRS topology should contain at least three network nodes that are

connected in a line.

3. The ‘middle’ network node should allow only one VLAN id (e.g., id=65)

4. NRS telnet service must be enabled


1. Login to the NRS admin shell through

telnet

2. Connect a port of a network node

without VLAN limitations to a new

L2 network. Force the L2 network’s

association with a VLAN id different

than 65.

3. Connect a port of the network node

with the VLAN limitation to the L2

network created in the previous step.

4. Connect a port of the third network

node to the L2 network of step 2.

5. Connect a port of the network node

with the VLAN limitation to a new

L2 network.

6. View the NRS VLAN status.

1. The first ‘connect’ operation should

finish successfully.

2. The second ‘connect’ operation

should fail with the error: ‘Cannot

accommodate chosen VLAN id’.

3. The third ‘connect’ operation should

fail with the error: ‘Cannot find path

to L2 network’

4. The new L2 network should be

associated with VLAN id 65

FR4: Administrator access to topology

This requirement refers to the ability of an administrator to insert or modify the

network topology, without restarting the NRS service or affecting existing

connections. This is possible to do with the use of the telnet service that allows direct

access to the topology. There are two administrator actions that involve modifying

the topology: one is related to static parts of the topology and modifies nodes and

ports, and the other modifies the L2 networks of the topology. The tests are shown in

Table 15 and Table 16.


Table 15: Administrator static topology modification test

Summary

Verify that an administrator can modify the static topology while the NRS service is

running.

Preconditions

1. NRS must be running and the NRS topology must contain

i. at least one L2 network that spans over 3 network nodes

ii. at least one network node that is not part of the L2 network




telnet

2. View the L2 network’s member ports.

3. Select a node that belongs to the L2

network, and remove it from the

topology.

4. Commit the change.

5. Select the node that is not used in the

L2 network, and remove it from the

topology.

6. Commit the change

7. Compare the L2 network member

ports with the ones before the

commit.

8. Perform the ‘network connectivity’

test on the machines that belong to the

L2 network

1. The first commit should return an

error: ‘Incosistent L2 network’

2. The second commit should complete

successfully.

3. The L2 network ports should remain

identical.

4. The pings from the ‘network

connectivity’ test should return ping

responses.

Table 16: Administrator L2 network modification test

Summary

Verify that an administrator can modify an existing L2 network in the topology.

Preconditions

1. NRS must be running and the NRS topology must contain

i. at least one L2 network that spans over 3 network nodes in a line, so

that one of the nodes is the ‘middle’ node in the L2 network graph

ii. at least one network node that can replace the existing ‘middle’ node in

the network graph (and become the ‘new middle’ node)





telnet

2. Remove the ports of the L2 network

that belong to the ‘middle’ node.


4. Identify which ports that belong to the

‘new middle’ node will make the L2

network consistent, and add them to

the L2 network.

5. Commit the change

6. View the new L2 network member

ports.


test on the machines that belong to the

L2 network

1. The first commit should return an

error: ‘Incosistent L2 network’

2. The second commit should complete

successfully.

3. The L2 network ports should now

contain ports from the ‘new middle’

node.

4. The pings from the ‘network

connectivity’ test should return ping

responses.

FR5: Administrator access to network isolation

This requirement refers to the ability of an administrator to configure the network

isolation mechanism used, and more specifically to restrict the usage of a set of

isolation identifiers. These restrictions are best understood when the network

isolation is 802.1Q VLANs and the administrator wants to deter the system from

using a set of VLAN identifiers of his/her choice. The acceptance test is shown in

Table 17.

Table 17: VLAN administrator restrictions test

Summary

Verify that the administrator can deter the system from using a set of VLAN ids of

his/her choice.

Preconditions

1. NRS must be running with 802.1Q as the network isolation method.




telnet

2. Restrict available VLAN ids to only

one (e.g. id=76)


4. Connect a network interface to a new

L2 network.

5. View the VLAN status.

6. Connect a network interface to

another new L2 network.

1. The VLAN status should show that

the new L2 network is associated with

VLAN id 76.

2. The second ‘connect’ operation

should fail with the error: ‘No

available VLAN id found’.

FR6: Internet connectivity

This requirement refers to the ability of the NRS system to provide external network


connectivity to the L2 networks. One such network can be the Internet. This

functionality is supported by the system with the use of gateways, special network

nodes whose device plugins can configure them so that they can connect an external

network to a local L2 network (section 4.9). An Internet gateway was not

implemented for the prototype; instead the gateway concept is used by the OpenVPN

gateway that serves inter-cloud connections, discussed in the following section.

FR7: Inter-cloud connections

This requirement refers to the ability of the NRS system to negotiate and create inter-

cloud connections that connect L2 networks located at different cloud sites. As

described in section 4.9, this is supported by the system with a negotiation that takes

place between remote NRS systems, and the configuration of the inter-cloud

gateways in each site.

The inter-cloud connection was implemented with OpenVPN. The network

connectivity provided by the inter-cloud connection can be verified in a similar

fashion to verifying FR1. The acceptance test for OpenVPN inter-cloud is shown in

Table 18.

Table 18: OpenVPN inter-cloud test

Summary

Verify that once two L2 networks are connected over an inter-cloud, network

interfaces that belong to them can reach each other.

Preconditions

1. At least two NRS services with inter-cloud enabled need to be running and

operate on ‘remote’ network topologies. The NRS services should be able to

reach each other over a network.

2. Each NRS service should control a functional OpenVPN gateway.

3. Each topology should contain at least one L2 network, which should be

permitted to be connected to an inter-cloud connection in the NRS

configuration. Each L2 network should contain at least one network

interface.


1. Pick one of the NRS services and ask

to connect the two L2 networks over

an OpenVPN inter-cloud. (nrs inter-cloud start -type

OpenVPN ...)


test on at least two network interfaces,

each belonging to a different L2

network (i.e. each belonging to a

different cloud site)

1. The pings should return ping

responses.

FR8: Inter-cloud mode

This requirement refers to the system being able to operate in an ‘inter-cloud only

mode’, so that it offers the inter-cloud functionality to a cloud site, but none of the

intra-cloud functionality (section 4.9). This mode can be specified in the NRS

system’s configuration file. In this mode, the system does not maintain any internal

topology but can perform the inter-cloud negotiation, and can optionally configure a

stand-alone gateway. The test for this case is similar to FR7, but with one service’s

intra-cloud functionality disabled. In addition, the ‘inter-cloud only mode’ system


should accept only inter-cloud CLI requests.

FR9: Advanced network features

This requirement refers to the system’s ability to provide network features beyond

simple connectivity, such as bandwidth guarantees, QoS and ACLs. This feature was

not implemented in the NRS prototype, as it is a feature that introduces complexities

in all the features of the system and could not have been implemented in time.

However the system’s architecture was built with this feature in mind so that it can be

provided as a future extension (see future work discussion in Chapter 8).

7.2 Non-functional validation

Apart from functional expectations, the system is required to have certain qualities

that are related to how the system operates. These are described by the non-functional

requirements of the system. Their fulfillment is determined by the system’s

architecture which is described in Chapter 4. The extent to which the non-functional

requirements are fulfilled is described in this section.

NFR1: Arbitrary network topologies

This requirement refers to the fact that NRS should support operating on any arbitrary

topology, i.e. not be limited to a specific type or form of topology. This is achieved

through the use of graphs; they allow the specification of any arbitrary topology as

long as it contains switches and hosts, with network links between them. The graph

algorithms operate on any graph topology as well, therefore connections are possible

to achieve on top of arbitrary topologies.

NFR2: Support for configuration of any network device

This requirement refers to the ability to configure any arbitrary switch or network

device in general. This is achieved with the use of device plugins that have the

knowledge of performing specific device configuration, as explained in section 4.7.

The creation of the device plugins is left to the specific cloud site maintainer, but they

allow integration of any device with the system. A few proof-of-concept plugins were

implemented, as described in section 6.2.

NFR3: Replaceable network isolation mechanisms

This requirement refers to the system’s support for replacing the existing network

isolation method with a different one. The chosen isolation is 802.1Q VLANs, but the

system should be extensible with new isolation mechanisms. The system supports

replacing 802.1Q VLANs, as long as the new isolation mechanism fits with the

concepts of 802.1Q. That means that, the new mechanism may use some kind of

identifier to identify different isolated networks, and that the network separation is

enforced at the level of network devices through proper plugin calls.

To extend the system with a new such mechanism, a new module needs to be created

that will handle the isolation identifiers that will implement the Network Isolation

Manager interface (section 4.5), and new plugin commands need to be implemented

that will enforce this isolation mechanism. These concepts are sufficient to introduce

VXLANs, Q-in-Q, and MAC-over-IP tunneling to the NRS system. However, a

proof-of-concept implementation of a mechanism other than 802.1Q was not

implemented for the system.

NFR4: Interface to cloud platforms

This requirement refers to NRS being accessible by any cloud platform without being

limited or tightly coupled to only one platform. An OpenNebula plugin was created

for the needs of the prototype, but the NRS interface is not tailored specifically to

OpenNebula. In the case of intra-cloud functionality, NRS requires to be provided


with the minimal information in order to be able to provide network connectivity,

which consists of the network interface names and the hosts that they belong to. As

discussed in section 2.4, both OpenNebula and OpenStack provide this information in

the interfaces they provide for network virtualization services, namely OpenNebula’s

virtual network manager and OpenStack Quantum. Since these two cloud platforms

are the only ones providing clean interfaces for this purpose, it is likely that other

cloud platforms will follow suit. Apart from cloud platforms, the NRS interface can

be used in a human-friendly way as well with the provided NRS CLI tool.

NFR5: Support for various inter-cloud connection types

This requirement refers to the system’s ability to be extended to support different

inter-cloud connection types. The plugin system of NRS applies to inter-cloud

gateways as well; a plugin configures the gateway. Therefore, extending NRS with

new connection types is reduced to creating the proper plugins. The NRS inter-cloud

negotiation supports exchanging configuration details and passing them to the inter-

cloud gateway plugin. This allows the exchange of connection-specific configuration

details for any connection type.

NFR6: Performance

This requirement refers to the amount of time the NRS system requires to fulfill a

connectivity request. Requesting a virtual machine from a cloud service can take

from a few seconds to a few minutes, depending on configuration choices. The

overhead of network configuration provided by the NRS system should not increase

that time by a substantial amount. Therefore, an upper limit of 10 seconds to service a

request is considered sufficient for the needs of a generic cloud user.

NRS deals with requests in two distinct steps: the reservation and allocation steps

(section 4.8). The reservation step’s execution time depends on the performance of

the algorithm used to operate on the topology. The logic decision of how to map a

connection request can be very fast if the algorithm used is simple. On the other hand,

coming up with complicated resource requests and expecting a “good” decision may

take some time (depending on the algorithm used, the complexity of the topology,

and especially if algorithms try to map bandwidth). Different classes of users may

have different needs, trading a fast decision for a better decision or vice versa.

Different algorithms can be used for different usage requirements, and this influences

the balance of response time to decision quality. In any case, introducing new

algorithms in the system is supported by its architecture. The simple path-finding

algorithm implemented for the NRS prototype (uses Dijkstra shortest path which has

) complexity) is among the fastest path-finding algorithms, and not the

bottleneck in the NRS performance; the time-consuming step is the allocation.

The performance of the allocation step, which actually deploys the connection (i.e.,

configures switches), depends on the device plugin implementation. More

specifically, the amount of time required for allocation is limited by the least possible

time to configure a device with a chosen communication method, e.g., it takes a few

seconds to complete an automated SSH session to a switch. The allocation step

performance can be improved in two ways:

1) optimize the invocation of device plugins per device, i.e. launch automated

sessions only once per device (to minimize session overhead such as

initialization, authentication), and launch sessions for different devices in

parallel

2) optimize the performance of each separate device plugin. If a chosen

communication method is too slow, it is perhaps advisable to implement one

with less communication overhead (e.g., replace SSH sessions with SNMP

requests)

The NRS prototype is primarily a proof-of-concept prototype and therefore not

optimized for performance. The SSH device plugins are quite heavy-weight when it

comes to execution time. The prototype implementation still takes less than 10


seconds for requests that involve less than five network devices. With further

optimizations, the system can become much faster.

7.3 Verification

Apart from fulfilling requirements, the system also needs to be checked for

correctness, i.e. whether the system is built according to the design specifications and

whether it contains implementation faults.

The implementation language of choice, Python, uses automatic garbage collection

and does not leave room for common memory management errors associated with

languages such as C and C++. In addition, the static analysis of the source with with

the pylint tool gave good scores when it comes to possible programming errors

(shown in section 5.4).

A testing suite was developed along with the rest of the system. The suite contains

simple unit tests that test each class and method against their functional

specifications. In addition, the suite contains integration tests between the different

modules. The suite’s focus is mostly towards the software logic related to the

topology, the algorithms, the consistency checks and the reserve/release logical

operations. These are the core of the system’s logic, and their correctness is much

more important than peripheral components of the system, such as device plugins.

The suite is focused much less on testing the latter, which have a more pronounced

‘proof-of-concept’ nature. The test coverage was measured with the coverage54

tool

and was found to be at 72%. This is considered a fairly good result when it comes to

production systems, much more for prototype systems.

54 http://pypi.python.org/pypi/coverage

Conclusions 99

8 Conclusions This chapter presents the outcome of the project for the NRS system and compares it

against the initial project goals. In addition, it puts the NRS system in perspective

with regard to existing cloud networking solutions, as well as with regard to the Grid

software ecosystem. Moreover, it discusses recommendations for further

development of the system.

8.1 Functional Results

The system’s functionality envisioned at the beginning of the project was quite broad

and extended over different domains: cloud platforms, network discovery and

configuration, modeling of network topologies, system administration and

monitoring. A few main functionalities were identified as the pylons of the system,

but a wealth of possible extensions of its functionality was considered. The NRS

system is most correctly viewed as an architectural and functional basis that provides

the main functionalities and is built to be easily extensible to support more.

The project culminated in the design of the NRS system and the creation of its

software prototype. The system combines network topology knowledge and device

configuration ability to essentially create a platform for Network Virtualization. This

platform is the machinery that can dynamically apply network configurations both in

hardware devices and in virtual switches; these configurations are inferred by

requests from an external service or user. In the basic use-case supported by the

prototype, these are requests to create Virtual Networks. The external service

provides the information on what to provision; NRS knows how to provision it. The

services that benefit the most from the functionality provided by NRS are cloud

platforms; NRS is the glue between cloud platforms and network hardware.

8.1.1 A VM-aware cloud networking solution

NRS can be characterized in different ways based on the point of view of its users.

From the perspective of a cloud platform, NRS is a VM-aware networking solution to

creating Virtual Networks. This term refers to the ability of the system to provision

network resources dynamically as required by the network, i.e. as required by the

VMs that need to connect to each other. A VM-aware solution lies in contrast to static

configurations; it makes optimal usage of the limited resources (in this case 802.1Q

VLAN ids), which in turn greatly increases the VLAN scalability potential. RFC

555655

provides an estimation on Ethernet scalability limitations. Based on that

estimation, the NRS system can theoretically increase the scalability potential from

1.000 hosts, which corresponds to the single large broadcast domain in the case of

static VLAN configurations, to 100.000 hosts in 1.000 different VLANs. This is not a

precise number of course, and VM-aware solutions work best when VMs are spread

over a large number of hosts and switches, so that the VLAN distribution is not

overlapping. It is however a strong indication on the system’s scalability benefits.

Moreover, NRS uses 802.1Q VLANs which guarantee Virtual Network isolation and

privacy due to the nature of the protocol. Maximum performance based on the IP

stack is also achieved; 802.1Q uses regular IP packets inside L2 frames for network

traffic.

NRS, or an equivalent VM-aware solution, can utilize 802.1Q VLANs to the

maximum of their potential. However, there are different Network Virtualization

solutions that do not rely on this networking standard to provide Virtual Networks

(these have been described in sections 2.3.3 and 2.7). These solutions are mostly

based on L2 over IP tunneling to implement Virtual Networks, and can theoretically

55 http://tools.ietf.org/html/rfc5556#section-2.6

Conclusions 100

scale to much higher numbers that NRS (with 802.1Q) can provide. However, this

comes with a cost of performance, since the encapsulation of packets required by

tunneling requires significant computing power, especially in high-speed networks.

On the other hand, this can be overcome by more advanced (and expensive)

hardware. In general, the correct choice of the network virtualization depends heavily

on the specific network hardware deployment of each cloud site.

Apart from the functionality that NRS offers, the accessibility of its service is

important as well. NRS’s API concepts were based on the cloud interfaces that cloud

platforms provide specifically for such network services: OpenNebula’s virtual

network manager and OpenStack Quantum. This makes the system usable by the two

major cloud stacks that offer specialized networking interfaces. If cloud networking

becomes a promiment issue, it is very likely that such interfaces will become de-facto

standards and will be adopted by other cloud platforms.

8.1.2 A network resource management solution

From the point of view of a network administrator or a maintainer of cloud

infrastructure, NRS is promptly described as a Network Resource Management

solution. Its ability to maintain a representation of the network topology and

dynamically configure devices based on the limitations of each device (VLAN

limitations for example) takes away the burden of manual configuration. NRS will

work with any network hardware, as long as the right plugin is created for it. That

makes it a great choice for sites that use hardware considered ‘legacy’. Its usage of

802.1Q, a universal network standard, further assures that all hardware is supported.

On the other hand, non-legacy devices usually do not pose VLAN limitations in their

capabilities.

In addition, NRS allows a potential cloud provider to install a cloud platform and

utilize part of its infrastructure for the cloud service with minimal re-arrangement and

re-organization of the infrastructure. NRS can make sure that only the proper part of

the infrastructures’s network is utilized for the cloud service, leaving the rest

untouched, effectively providing an easy way of partitioning the infrastructure

network. Such a thing is not possible to achieve without the usage of a system similar

to NRS. It should be noted however that not all infrastructure providers choose to mix

infrastructure that serves different purposes.

8.1.3 Beyond cloud platforms

The NRS system’s usefulness is not limited to cloud platforms. It can also play a part

in the general Grid software ecosystem for infrastructure, as the tool that can apply

network configurations that match with user credentials: Grid users are identified by

X.509 “Grid” certificates, which are mapped to hardware privileges and/or

configurations. They can also be mapped to network characteristics such as a network

or VLAN id, and internal or external connectivity which should be attached to the

user’s VM(s). The policy engine which determines what can be provided to the user,

needs a tool such as NRS to enforce the provisioning, and in similar fashion a tool

such as a cloud platform to provide the actual VMs. This interaction is portrayed in

Figure 61. It should be noted that it is an envisioned interaction and not a reality as of

the writing of this report, as the EES56

policy engine system is under development.

56 http://wiki.nikhef.nl/grid/EES

Conclusions 101

8.2 Design criteria

There were several factors that influenced the design criteria for the creation of the

system. The development of the idea for the NRS system’s functionality was not a

significant aspect of the NRS project; the idea had already been developed at Nikhef,

and it had a fairly concrete form from the beginning of the project. However, the

development of the idea into a software system design that satisfies the functionality

was a major part of the project. The NRS system did not exist as a design or in any

software form when the project started; it was to be built from the ground up, and

converting the functional idea into the system’s design and architectural choices was

almost exclusively made during the project. The NRS as a system is ambitious and

large in scope; that means that the NRS project’s outcome would most likely result in

a basic design with the purpose of being extensible in the future. Therefore, laying

solid architectural foundations for the system was critical. For the project’s outcome

to be considered valuable, the newly developed system would have to be designed in

such a way that it satisfies all its envisioned future functionalities, as well as their

corresponding variabilities.

The major and recurring design factor was the system’s genericity. The system had to

interface with any network device to perform configuration without being limited by

the possible devices’ differences. It also had to expose its service in a manner that it

would be usable by any cloud platform or other service and/or user. In addition, the

system had to be able to easily substitute its network isolation mechanism with any

different one that could become prominent in the future. Lastly, the system involved

operations on the logical topology related to finding network routes (paths); the

algorithms that dictated the outcome of such operations had to be replaceable by ones

with different behavior to support more advanced functionalities. All these refer to

artifacts that could vary extensively. These variabilities were satisfied with extensive

use of interfaces, modules and object-oriented design patterns with the purpose of

abstracting and de-coupling system functionalities. These can be seen as the

DevicePlugin interface, the NetworkIsolationManager interface, the multiple design

pattern references and others details described extensively in Chapter 4.

The invetiveness of the system was also an important aspect. Creating a working

system would entail combining and integrating very different systems. It would

involve setting up a private cloud installation, exploring various network hardware

models, and create a system that communicates with cloud platforms and remote

versions of itself, operates on topology graphs and programmatically configures

Figure 61: NRS in the Grid ecosystem

Conclusions 102

network hardware. In addition, the system would be deployed in a data-center

infrastructure, taking into account intricacies and different configuration conditions

that arise in such environments due to their special security and reliability demands.

Although the realizability of such a system was not in question, successfully putting

everything together to create a functional and working system was important.

Lastly, documentation of the system was deemed important as well. This was for two

reasons: Firstly, the system’s design was brand new and the system was to be

extended in the future after the project’s end. Therefore, the design options had to be

extensively documented together with reasoning on the various design choices made

and what they accomplish. Developer documentation on the source code artifacts was

important as well. This would help with the continuation of the system’s development

and extension of its design and addition of new functionalities. Secondly, the system

is a rather complex technical solution that integrates several other systems and has

plenty of different deployment and configuration scenarios. A user manual that

describes these options is important for the system to be usable.

Realizability and impact were not considered important for the design of the system.

Realizability was not important since the technologies that would compose the

system were known to work. The feasibility of the system was not really in question.

Impact was not considered important as the system was not expected to have any

economical or societal impact or contribution.

8.3 Future development

The NRS system forms a basis that can be extended with various advanced

functionalities. The NRS core consists of its topology representation and the device

plugins concept; these can be viewed as a network resource management platform on

which more advanced or ‘clever’ operations can be automated.

There are several advanced functionalities included in the original inception of the

system, which are also mentioned in the Requirements chapter as factors to take into

account when designing the system. In addition, the system would benefit from

features that make it more ‘production’ oriented. These include:

Automatic network discovery:

NRS could make use of a network discovery tool or mechanism that would

initialize and update the topology automatically. This is an important

addition to make the NRS system more responsive to changes in its

environment, therefore more appropriate for production environmens

QoS provisioning:

The NRS system can also be further developed to introduce the concept of

network features, such as bandwidth, and the reservation of those. Such an

addition would require extensions of the system in four different levels:

1. User requests

A new class of user requests that include bandwidth guarantees

should be available to the system

2. Topology graph and algorithms.

Bandwidth capacity would need to be associated with Links and

Network Nodes in the topology.

Algorithms would have to be modified to take the bandwidth

capacity of links and network nodes into account, as they traverse

the topology graph to find paths.

3. Reservations

Reservation of requests should take into account and modify the

bandwidth capacity available to nodes and links, in a similar

fashion to how the runtime VLANs of devices are manipulated.

Conclusions 103

4. Device plugins

The device plugins need to be extended to be able to configure QoS

on network devices, such as bit-rate limiting. Proper invocation of

the device plugins would need to be introduced to the allocation

step, so that the bandwidth reserved in a previous reservation step is

correctly translated to device configuration.

Interface with a VM scheduler (also see Appendix B):

The NRS system can be used to influence VM scheduling decisions, with

network characteristics playing a decisive role in the choice of VM hosts by

the scheduler.

Elaborate algorithms:

The algorithms used for the NRS prototype are simple path-finding

algorithms. It may be desirable to replace them with ones that take the

realities of a specific topology into account, e.g., associate a L2 network

with a specific node in the topology and always force it to go over the node.

As another example, more complex algorithms can implement load

balancing strategies on the local topology.

Administrator web interface:

As an alternative to the CLI admin shell, a web interface would be suitable

to convey visual information on the network topology, the L2 networks, and

their underlying graphs.

Scaling the server implementation:

The synchronous server communication may need to be replaced with an

asynchronous event-driven implementation that can provide scalability when

it comes to multiple simultaneous requests.

System consistency:

To safely recover from crashes, the system would benefit from the usage of

a database to save the NRS state. At the moment the system state is saved in

local storage in the form of serialized Python objects that contain state

information. The state of the system is not guaranteed if something goes

wrong while the state is being saved. Saving them in a database instead can

provide safe data recovery. A NoSQL database can be used that can directly

store XML files. The serialized objects would need to be converted to an

XML representation as well.

Project Management 104

9 Project Management This chapter describes the software development process used for the development of

the system and the accompanying deliverables. It presents the project’s timeline,

major milestones and their completion, and risk assessment and mitigation.

9.1 Management Process

The NRS project management process is based on the Rational Unified Process

(RUP) [23] [24] framework. RUP is an iterative software development process that

divides the development effort in phases with iterations. It has similarities with

iterative waterfall models. The four phases that it defines are:

Inception phase: Understand what to build, identify key system

functionalities and identify risks.

Elaboration phase: Refine requirements, design and implement a skeleton

architecture

Construction phase: Iteratively develop a complete product.

Transition phase: Beta test and prepare for deployment.

The RUP process in general is a very detailed development process that defines a lot

of different artifacts, human roles, activities etc. The NRS project did not follow the

RUP process by the book, but adapted it to the needs of the development of the

prototype. The important phases for the development of a prototype are the Inception

and Elaboration phases. Construction is important as well in order to build a full

product, however parts of it may be omitted depending on the desired extent of the

prototype’s functionality. The Transition phase is not particularly relevant for

prototype development.

The development process and status of the project was managed and monitored for

the duration of the project with the use of two tools: a project management document

and a project management web application57

. The RUP phases were divided in

iterations. When the project started, the project timeline was divided among the

different perceived future iterations. For each upcoming iteration, an attempt was

made to detail it into tasks and accurately estimate its duration. The online tool was

used to keep track of detailed tasks of each iteration. After an iteration was over, the

Project Management document was updated with the results, and the next iteration

was detailed into separate tasks that were in turn monitored in detail online.

The project was assisted by Project Steering Group meetings, monthly meetings that

included the project’s supervisor. The meetings were most often arranged to coincide

with the completion of iterations, in order to present the outcome of the previous

iteration and to discuss the following iteration’s purpose and expected outcome.

9.2 Project Milestones

The completion of RUP phase iterations marked the milestones specified for this

project. Each iteration was related to a specific research or development activity that

produced tangible results in the form of deliverables, such as a design document, a

domain research document or source code that implements prototype functionality.

The implementation of a various system requirements were split among the different

iterations. The iterations specified for the project are shown in Table 19.

57 http://www.zoho.com/projects/


Table 19: NRS project milestones

Iteration Activities Motivation

Inception phase

Inception Iteration 1 Produce documents:

System Concept/Vision

Basic use-cases

Description of system

logic and interfaces

Requirements

Identify of the system’s key

functionality, expressed in

use-cases and requirements.

Gain familiarity with

domain entities such as

network switches.

Identify main risks.

Elaboration phase

Elaboration Iteration 1:

First intra-cloud prototype

implementation

Produce documents:

Basic architecture

Refined requirements

Implement:

Initial prototype with basic

intra-cloud functionality

Show feasibility of the

critical subset of the intra-

cloud functionality of the

system, as envisioned in the

Inception phase.

Identify the basic building

blocks of the system’s

architecture and the basic

interfaces of the system

Refine requirements.

Identify additional risks.

Elaboration Iteration 2:

First inter-cloud prototype

implementation

Extend prototype with basic

inter-cloud functionality

Produce documents:

Refined architecture

Refined system interfaces

Show feasibility of the

critical inter-cloud

functionality of the system,

as envisioned in the

Inception phase.

Refine the system’s

architecture and interfaces.

Construction phase

Construction Iteration 1:

Full-fledged service

implementation

Turn the prototype into a full-

fledged service:

turn it into a daemon

process

provide a CLI tool for

controlling it

streamline logging

Produce usage documentation.

Turn the prototype into a

real service, i.e., a daemon

process with listening

servers

Provide typical accessibility

features for the service (CLI

tool, logging)


Finalize Topology

Extend the logic topology:

Finalize the form of the

logic topology of the

system.

Fully implement grow

and shrink algorithms,

reserve and release

operations

Check for consistency

Document topology and

algorithms.

Fully develop the topology

so that it can support all

use-cases (requests for

reserving, allocating and

releasing network interfaces

from networks)



Device plugins

Test several switch models

Finalize device plugin

programming interface

Explore as many switch

models from different

vendors as possible, to

ensure the validity of the

device plugin interface


Administrator access

Extend the prototype with and

administrator access tool to

view and modify the topology

Introduce the administrator

actions to the system


Network resources

(bandwidth)

Add bandwidth capacity to the

topology nodes

Modify algorithms to

accommodate for bandwidth

requests

Add device plugin actions that

can enforce bandwidth

requirements

Enrich the system with

support for provisioning

requests that contain

bandwidth

Initially, estimations were made for the amount of time each iteration would need.

The estimations were placed in the project timeline as shown in Figure 62.

The initial estimations were generous; the project for NRS had an investigative

nature, therefore the estimations were to change after each milestone was reached.

The milestone trend analysis chart showed in Figure 63 shows how the completion of

each milestone actually varied throughout the duration of the project. We can identify

two main deviations from the initial plan:

1) It was chosen to complete the milestones of ‘Service Implementation’ and

‘Topology Iteration’ before the ‘first inter-cloud prototype’. This decision

was made due to the fact that these two milestones would provide a fully-

working intra-cloud prototype. Without their completion, the prototype

could not have been considered functional. If something would go wrong

later during the project, a functional prototype that does not fulfill all the

requirements would be better than a non-functional one that only has the

prospect of fulfilling all the requirements.

2) The ‘network resources (bandwidth)’ milestone was dropped from the

project plan. As the project progressed and the topology, algorithms and the

system architecture became clearer, it became apparent that implementing

bandwidth requests would be an endeavor that was not possible to fulfill for

the duration of the project. The ‘bandwidth’ iteration was dropped from the

project in early May.

The real project timeline, which depicts the completion of milestones as actually

happened during the project, is shown in Figure 64.

Figure 62: Initial project timeline


Figure 63: Milestone trend analysis

9.3 Risk Management

The purpose of the first iterations of the project was, among other things, to identify

possible risks, assess their impact, and dictate courses of action to avoid them. The

risks identified in the first iterations of the project are outlined in Table 20. The left

column of the table describes the courses of action taken to avoid certain risks. In

general, these actions influenced the development approach of the project, e.g.,

modified the order of iterations as discussed in section 9.2

40913

40941

40969

40997

41025

41053

41081

41109

41137

41165

41193report2012/09/15

tests2012/08/10

bandwidth2012/07/30

adminaccess2012/07/10deviceplugins2012/06/20topology2012/06/05

service2012/05/15

First Inter-cloud2012/04/30First intra-cloud2012/03/30Inception2012/02/20

Project Start2012/01/05

Figure 64: Actual project timeline


Table 20: Risk management

Risk Description Mitigation strategy

Inception Phase identified risks

Feasibility of technical

design

NRS is close to the

hardware. The technical

design is constrained by the

realities of hardware, the

protocols, the TCP/IP layer

stack, existing cloud

platform capabilities etc. If

the initial design omits to

take one of these into

account, it can be invalid

Talk to experts. Create

proofs of concept often and

continuously.

Feasibility of implementing

decision making on network

topologies

There is no implementation

of NML. Could be

complicated/immature to

use.

The problem of matching

network topologies will be

hard to solve.

Create modular architecture

so that more complex

algorithms/strategies can be

inserted in the future. Start

with simple network

allocation use-cases that

have simple logic.

Find experts in graph

theory.

Security risk The software under

development lies in the

cloud networking domain,

which is security-sensitive

and attack-prone. The

proposed solution may have

unexpected security flaws.

Talk with experts early to

avoid starting off with a

fundamentally wrong

solution.

Elaboration phase identified risks

Requirement may be left out

of the final product

Bandwidth, QoS etc.

provisioning is pushed

closer to the project's end.

Low impact risk. Inter-

cloud connectivity is more

important that bandwidth.

The topology model is

being changed in every

phase

The code-base that relies on

the topology is increasing.

Topology changes will be

needed for introducing

inter-cloud connections, as

well as later when

introducing bandwidth and

other network features.

Try to decouple inter-cloud

connection and other

modules as much as

possible from the topology,

so that they are re-usable

even if the topology has to

be created from scratch

Bibliography 109

Bibliography

References

[1] "RFC 11157: A Simple Network Management Protocol (SNMP)," 1990.

[2] "bridge - The Linux Foundation," [Online]. Available:

http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge.

[3] "Open vSwitch - An Open Virtual Switch," [Online]. Available:

http://openvswitch.org/.

[4] "Amazon Elastic Compute Cloud," [Online]. Available:

http://aws.amazon.com/ec2/.

[5] "Open Cloud Computing Interface," [Online]. Available: http://occi-wg.org/.

[6] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridge Local Area

Networks," [Online]. Available:

http://standards.ieee.org/getieee802/download/802.1Q-2011.pdf.

[7] IEEE, "802.1ad - Provider Bridges," [Online]. Available:

http://www.ieee802.org/1/pages/802.1ad.html.

[8] "Multiprotocol Label Switching IETF Working Group," [Online]. Available:

http://datatracker.ietf.org/wg/mpls/charter/.

[9] "Virtual Extensible LAN Internet Draft," [Online]. Available:

https://datatracker.ietf.org/doc/draft-mahalingam-dutt-dcops-vxlan/.

[10] Arista Networks, "VXLAN: Scaling Data Center Capacity," [Online]. Available:

http://www.aristanetworks.com/en/solutions/whitepapers.

[11] "OpenStack," [Online]. Available: http://openstack.org.

[12] "OpenNebula," [Online]. Available: http://www.opennebula.org.

[13] "Eucalyptus," [Online]. Available: http://www.eucalyptus.com.

[14] IEEE, "802.1Qbg - Edge Virtual Bridging," [Online]. Available:

http://www.ieee802.org/1/files/private/bg-drafts/d2/802-1qbg-d2-2.pdf.

[15] IEEE, "802.1ak - Multiple Registration Protocol," [Online]. Available:

http://www.ieee802.org/1/pages/802.1ak.html.

[16] "RFC 3535: Overview of the 2002 IAB Network Management Workshop,"

2003.

[17] P. Grosso, A. Brown, A. Cedeyn, F. Dijkstra and J. v.d. Ham, "Network Markup

Language - Context," [Online]. Available: https://forge.ogf.org/sf/go/doc14679.

[18] P. Grosso, A. Brown, A. Cedeyn, F. Dijkstra and J. v.d. Ham, "Network Markup

Language - Base Schema," [Online]. Available:

https://forge.ogf.org/sf/go/doc15674.

[19] E. Gamma, R. Helm, R. Johnson and J. Vlissides, Design Patterns: Elements of

Reusable Object-Oriented Software, 1994.

[20] M. Fowler, "FluentInterface," [Online]. Available:

http://martinfowler.com/bliki/FluentInterface.html.

[21] R. Ahuja, T. Magnanti and J. Orlin, Network Flows: Theory, Algorithms and

Bibliography 110

Applications, Prentice Hall, 1993.

[22] "networkX graph library," [Online]. Available: http://networkx.lanl.gov/.

[23] P. K. Per Kroll, The Rational Unified Process Made Easy: A Practitioner's Guide

to the RUP.

[24] P. Kruchten, The Rational Unified Process.

[25] J. J. Keijser, OpenVPN 2 Cookbook, 2011.

[26] OMG, "Unified Modeling Language, Superstructure 2.4," [Online]. Available:

http://www.omg.org/spec/UML/2.4.1/.

[27] R. Buyya, J. Broberg and A. Goscinski, Cloud Computing - Principles and

Paradigms, 2011.

[28] B. Panzer-Steindel, "Overview of the Computing Fabric at CERN," [Online].

Available: http://lcg-computing-fabric.web.cern.ch/lcg-computing-

fabric/fabric_presentations/overview_docs/.

[29] Brocade, "Brocade Ip Primer," [Online]. Available:

http://community.brocade.com/docs/DOC-1826.

[30] SURFnet, "SURFlightpaths," [Online]. Available:

http://www.surfnet.nl/en/hybride_netwerk/surflichtpaden/pages/lichtpaden.aspx.

Glossary 111

Appendix A. Glossary NRS lies in the network domain of Infrastructure-as-a-Service cloud computing,

which encompasses a variety of domain specific concepts. The definitions of the

concepts related to this project are contained in the tables of this chapter. The

information is presented as follows: On the left column is the concept’s term and on

the right column is the description. If a term has synonyms, they are included in

parenthesis below the term that is chosen to represent the concept throughout the

document. You can find definitions for IaaS cloud concepts in Table 21, for some

virtualization concepts in Table 22, and for networking in Table 23.

Table 21: IaaS Cloud terminology

Term Description

Cloud Computing The delivery of computing as a service, provided over a

network (e.g., the Internet). Three different fundamental

cloud computing services can be provided:

an application: The service model then is called

Software-as-a-Service (SaaS)

a computing platform: Platform-as-a-Service (PaaS)

raw resources (cpu, block storage, network):

Infrastructure-as-a-Service (IaaS)

The term “Cloud computing” most often refers to the IaaS

service model of cloud computing It is the service model

that the term will refer to in this document.

Infrastructure-as-a-

Service (IaaS)

The most basic cloud model service, where the resources

offered are computers (virtual machines), storage space and

network.

Platform-as-a-Service

(PaaS)

A cloud model service where the user is offered a

computing platform, typically an OS with a language

execution environment that can be used to run programs or

submit jobs.

Software-as-a-Service

(SaaS)

A cloud model service where access to an application is

provided over a thin client, often a web browser.

Network-as-a-Service

(NaaS)

The delivery of network features (such as firewalls, load

balancers) as a service over a network. Usually refers to

delivery of such network features to IaaS resources.

IaaS Cloud

(or simply cloud)

A cloud infrastructure, typically deployed in one or more

computer clusters, that provides IaaS. An IaaS cloud has

“users”, which request and receive virtual machines or other

resources that the cloud provides.

Public Cloud A cloud deployment where resources are available to the

general public, either for free or on a pay-per-use model.

The cloud infrastructure is of no concern to the user.

Private Cloud A cloud deployment where the cloud infrastructure is

operated for a specific organization and the cloud resources

are available to its users. The cloud infrastructure is set up

by/for this specific organization.

Glossary 112

Cloud platform

(Cloud stack,

Cloud resource

manager)

A software platform that implements an IaaS cloud on a

computer cluster. It controls and provisions resources in an

IaaS cloud. It can provide several interfaces and APIs

through which it services requests.

The cloud platform typically requires one or more machines

to run its components, and a set of machines to act as virtual

machine hosts.

A few major open-source cloud platforms: OpenStack,

OpenNebula, Eucalyptus.

Cloud site Refers to the infrastructure that hosts an IaaS cloud and the

service itself as a whole. A cloud site is located in one data-

center.

Computer cluster A group of computer servers that typically are used to offer

a specific functionality, e.g., to host an IaaS cloud. They are

connected to each other via switches.

Switching Infrastructure

(of a data-center or

computer cluster)

The network switches that interconnect the computers of a

data-center or computer cluster.

Data-center A facility that hosts one or more computer clusters.

Management Network A data-center typically has a separate network used to

manage network devices and computers. This network is

physically separated from the production network.

Production Network The network infrastructure used by cloud users, when they

initiate network traffic. Separated from the management

network for security reasons.

Multi-tenant Network A data center network that is logically divided into smaller,

isolated networks. They share the physical networking

infrastructure, but operate without any visibility to each

other.

Hardware Virtualization

Table 22: Virtualization terminology

Term Description

Hardware Virtualization A technology that allows to create virtual hardware

platforms within an operating system. These act as a real

computer (a virtual machine): they can be used to install and

operate a new operating system.

Virtual Machine (VM) An isolated operating system that is hosted within another

operating system (guest OS and host OS respectively).

Hypervisor Software that implements hardware virtualization and can

run multiple operating systems (guests) on a host computer.

Virtual Machine Host

(Virtualization Host,

Hypervisor Host)

A computer that runs a hypervisor and is used as the host

operating system for VMs.

Glossary 113

Networking

Table 23: Networking terminology

Term Description

Computer Network The interconnection of computers by communication

channels that allow exchange of data. The exchange may be

assisted by special hardware devices.

Network Host A computer connected to a computer network.

OSI Model A (theoretical) protocol stack that attempts to abstract a

communications system into different layers. Each layer is

responsible for handling a specific problem and exchanges

data only with its neighboring layers. Consists of:

1. Physical layer

2. Data link layer

3. Network layer

4. Transport layer

5. Session layer

6. Presentation layer

7. Application layer

TCP/IP Protocol Stack The most popular protocol stack to implement networking.

It shares many similarities with the OSI model, but it is not

identical. The TCP/IP stack consists of layers. Exchanged

information will go from the bottom layer to the top to reach

the user, and vice versa to be sent to a different machine.

Each layer will add additional information to the unit of

exchange.

The stack consists of:

Link layer: contains communication technologies for a

local network (Ethernet)

Internet layer: connects local networks (IP)

Transport layer: handles host-to-host communication

(TCP)

Application layer: handles process-to-process data

exchange (HTTP)

The link layer data is sent over a physical medium to its

destination (e.g., a cable). The physical medium (physical

layer) is decoupled from the TCP/IP stack and has many

different implementations.

Ethernet The most common technology for local area networks.

Divides a stream of data into frames, which contain source

and destination addresses.

Glossary 114

Ethernet broadcast

domain

(L2 broadcast domain,

Ethernet segment)

When Ethernet frames need to be sent to a destination, they

will be broadcast over the physical medium (e.g., a cable) to

all reachable devices. That is, every device that is

“connected” to the medium will receive the frames. This

collection of reachable targets is called an Ethernet

broadcast domain (or layer 2 broadcast domain, from OSI

layers).

Ethernet bridging A frame forwarding technique, used by network devices. It

allows connecting multiple Ethernet broadcast domains into

one.

Local Area Network

(LAN)

Represents a network that connects computers in a limited

area. May be used to refer to an Ethernet segment.

Gateway A network point (host or other device) that acts as an

entrance to another network. In the context of Ethernet, a

gateway is an entrance to a different Ethernet segment, or in

the context of LANs, to a different LAN.

Network Interface (IF,

NIC)

(Network Interface

Controller,

Physical Network

Interface)

Computer hardware that connects a computer to an Ethernet

domain and allows exchange of Ethernet traffic. It may

implement processing of the TCP/IP stack in hardware.

Each network interface has a unique 48-bit number, the

MAC address.

The term physical network interface is used to differentiate

from a virtual network interface.

Network Switch

(switch)

A computer network device that connects network devices

or segments. All devices connected to its ports are put in the

same Ethernet broadcast domain.

Different switch models can have very different and

advanced functionalities (QoS). All production switches

support 802.1q VLANs.

Switch Port Switches have a set of ports, which have a similar function

with network interfaces: to exchange Ethernet traffic. Unlike

network interfaces, a switch has no need to further process

the TCP/IP stack.

Not to be confused with TCP ports.

Switch Backplane,

Switching Fabric

Internally, the switch forwards frames to different ports

using specialized hardware. This is called the switch

backplane, and its implementation determines the

forwarding rate of the switch.

802.1Q VLANs Virtual LAN is a networking standard that supports

separation of Ethernet broadcast domains, while using the

same physical medium. This is achieved using a special

header in the Ethernet frames. The header specifies in which

VLAN the frame belongs. The network devices that handle

the frames use this information to forward the frame

correctly, which effectively separates the LAN into VLANs.

The VLAN specification supports up to 4094 different

VLANs.

Glossary 115

VLAN access port A port can be assigned to a single VLAN. It is then called

an “access port”, as connecting a device to it will give the

device access to that particular VLAN.

VLAN trunk port A port can be configured to allow multiple VLANs. This is

used when traffic from multiple VLANs has to be passed

over one port, for example when forwarding those VLANs

to a different switch over that specific port. These ports are

called trunk ports.

Quality of Service (QoS) Technology that allows to differentiate between types of

traffic and allow transport of it with special requirements

(e.g., being able to identify traffic for a real-time application

and to guarantee low latency for it)

Network Access Control

List (ACL)

A list of rules that can be applied to ports of a network

device to control outbound and inbound traffic (similar to

firewalls in this context).

Virtual Network

Interface (VIF, vNIC)

Virtual representation of a network interface that

corresponds directly to a physical network interface. Used

by Virtual Machines to connect to networks.

Virtual Machine

Gateway

Term used to describe the device that operates as the

“gateway” of the Virtual Machines to a network lying

outside the VM host. The term may differ from the regular

“gateway” term, since the network up to the VMs may

already be a single Ethernet broadcast domain.

Linux Bridge Software that runs on a Linux host and can “bridge” two or

more network interfaces of the host. All network traffic

destined to one network interface that is a member of a

bridge will be copied to all the other interfaces that are

members of the bridge. This allows to bridge different

Ethernet segments, if these segments are accessible via

different network interfaces of the host.

Virtual Switch Software that simulates the behavior of a hardware switch

for a host’s network interfaces (corresponding to the

hardware switch’s ports). Typically a software switch

supports much more than Ethernet bridging, with features

commonly found on hardware switches (e.g., QoS, access

lists)

NRS and VM scheduling 116

Appendix B. NRS and VM

scheduling This appendix discusses the subject of Virtual Machine Scheduling that takes

networking capacity58

into account. In other words, the problem of choosing the

proper Host for a Virtual Machine, so that the Virtual Machine’s constraints on CPU,

Memory, and Network capacities are satisfied. The description of the NRS system’s

functionality assumes that this decision is taken outside of NRS. However, a system

like NRS that maintains a network topology is suitable to provide input to the VM

scheduling process. This immediately raises the question of why the NRS prototype

is not integrated with a VM scheduler, i.e. instruct the cloud platform on which hosts

to deploy requested Virtual Machines? This chapter attempts to describe the problem

and show that it was not feasible to tackle for the duration of the project.

OpenNebula’s VM scheduler

OpenNebula employs a custom VM scheduler. It allows the user to define

requirements for his VM, in the form of a CPU capacity and a Memory capacity

(these are hard constraints). CPU is defined in the form of a ratio, where 0.5 is half a

real CPU, and Memory is defined in MBytes. For each VM host, OpenNebula

maintains information on current CPU and Memory utilization using the above

metrics. In addition, the scheduler can have a Policy; this defines an order of

preference in the available hosts, and can have many forms. The policy plays the role

of soft constraints. For example, a policy is to try to choose hosts that have utilized

CPU less than 0.4. A different policy can be to choose hosts so that VMs are as

spread over hosts as much as possible. The policy associates a Rank for each Host

that specifies how well it suits the preference. The scheduler will work roughly as

follows to satisfy VM requests:

Iterate on all the Virtual Machines pending for launch:

1) Satisfy hard constraint: Filter out hosts whose current CPU and Memory

utilization cannot satisfy the requirements.

2) Satisfy soft constraint: Among the remaining hosts, choose the one that has

the highest Rank.

It is important to note that the Rank is one number associated with only one host that

uniquely quantifies how well-suited the host is for the specific VM’s requirements.

Trying to apply this idea when introducing network capacity appears to me to be

problematic.

A metric for network capacity?

Unlike CPU and memory capacity that are numbers associated with one host only and

independent of other hosts, a metric for network capacity does not easily make sense

as a standalone and independent number for a host. Network capacity by definition

exists between more than one hosts, and the capacity of a host can be only defined in

relation to other hosts that the host can be connected to.

From the scheduler’s perspective however, it would be convenient if a single number

can be used to judge the Rank of a host when it comes to how well it satisfies a

request on network capacity, so that choosing the highest ranking hosts will result in

the network between them being as close to the network capacity request as possible.

We will try to come up with such a number in the following example.

58 aka bandwidth


Network Metric Example

We assume three VM hosts available for VM scheduling (Figure 65). They are

connected to each other over a network whose details we do not care about (assume

unlimited bandwidth). All connections are full-duplex. Host 1 is connected to the

network with a 1 Gb link, Host 2 with 500 Mb and Host 3 with 100 Mb.

We also assume that the total bandwidth available to VMs within the same host is up

to 1Gb (Figure 66, the sum of the links is not allowed to exceed 1Gb). If there is 1

VM in the host, it can connect to ‘out there’ with 1Gb (1 link total). If there are 2

VMs and the bandwidth is split equally, they get 0.33 (1/3) Gb each (3 links total:

VM1 to VM2, VM1 to outer network, VM2 to outer network). If there are 3 VMs,

they get 0.17 (1/6) Gb each (6 links: VM1 to VM2, VM1 to VM3, VM2 to VM3 and

three links going out). The equation that calculates this ratio is

) for number of

VMs.

Now we assume the setup of Figure 65 with no VMs active. A user requests two VMs

and asks for 1 Gb capacity among all of them. The following table shows the possible

combinations and the percentage of the requested capacity that becomes available.

Figure 65: Three VM hosts connected to each other

Figure 66: Network capacity inside a VM host


Table 24: Links between VMs compared to 1Gb requested capacity

Chosen host pair for the VMs Link between the

VMs

Percentage of

bandwidth satisfied

(Host1,Host1),(Host2, Host2) or (Host3,

Host3)

1Gb 1

(Host1, Host2) 500Mb 0.5



We will try to turn this information into a ‘Rank’ that is exclusive to each host by

giving each host the bandwidth percentage that is the highest among all pairs that the

host is involved in. The result of this is shown in Table 25.

Table 25: Rank based on highest value among pairs

Host Rank

Host1 1

Host2 1

Host3 1

Of course this is not correct, because the scheduler sees all hosts as being equally

good, and it may choose the pair (Host1, Host3) which ends up in 100Mb capacity

available. The same problem appears if we rank hosts based on the lowest percentage

among all pairs that the host is involved in, shown in Table 26.

Table 26: Rank based on lowest value among pairs

Host Rank

Host1 0.1

Host2 0.1

Host3 0.1

It seems like an algorithm that ranks hosts will need to selectively choose to rank

some hosts highly, and rank others low leaving some valid options out, such as the

results in Table 27.

Table 27: Rank based on choosing one among the pairs that works

Host Rank

Host1 1

Host2 0.5

Host3 0.1

This ranking leaves the option (Host3, Host3) out of the question, however there is no

apparent way to do this differently. (Host1,Host1) gets the highest rank, while

(Host1,Host2) is lower but still in the range of acceptable. Apart from (Host3,Host3)

the result of this ranking method corresponds to the network reality. This network

rank can be used in conjunction with the CPU and Memory rankings for the scheduler

to come up with a balanced decision.

Retrieving the network capacity information

Apart from trying to reduce the capacity values of host tuples into one scalar value


per host, the capacity associated with each grouping of VMs has to be retrieved in the

first place. There are different combinations of putting VMs into

hosts. For each of these, some information needs to be calculated that determines how

well the VM-to-hosts combination complies with the requested network capacity. A

brute-force algorithm that goes through all of them has exponential complexity (NP).

However the assumption that each Vms-to-hosts mapping is associated with one

capacity value is naive. In the internal network topology, each of these mappings can

be implemented in not one but multiple different ways. This is due to the fact there

may be multiple different ways to implement a connection between two hosts, a

number in the degree of the number of internal nodes that exist between the two

hosts. So in reality, not only are there different combinations, but

each of them has )

different network paths that

implement it, each of which may have different capacities. is the number

of different internal nodes (essentially paths) through which the hosts can be

connected, and the power is the number of different link between hosts that are

distributed over these nodes (same equation with when calculating VM links

previously).

Open Questions

The situation described in the sections above raises the following topics:

1) Research on schedulers: Has a “distributed” resource like network capacity

been modeled before to use in scheduling? Is the OpenNebula scheduler a

representative sample?

2) Is the mapping of the capacity value of host pairs to a ‘Rank’ scalar value for

one host even necessary from the scheduler’s perspective, or is there another

way around it?

3) It appears that to be able to obtain network capacity information, you need a

graph representation of the network topology and the ability to operate on it

with algorithms. The NRS system, as created for this project, is of great

value for this, although the network topology representation would need

some further research.

4) Brute-forcing the way to retrieve network information each time a VM is

requested may lead to exponential explosion. Different graph algorithms

may exist, or may have to be devised to deal with the problem in a domain-

specific way (for example algorithms using heuristics combined with the

networking infrastructure realities: you may want half of your connections to

always go through a specific internal node). A ‘dumb’ algorithm may prove

to be a better solution eventually.

5) The example used serves a simple request: 1Gb capacity among all VMs.

More intricate topologies may be requested (such as: 1 ‘server’ VM with

1Gb to 3 ‘client’ VMs, and 100Mb among the ‘client’ VMs), which

influences the algorithm for finding paths and ranking hosts.

6) All the above deal with figuring out the best paths and translating them to

host rankings. Actually enforcing the network capacities (QoS) is a different

topic that involves ability to configure devices (the NRS system, as created

for this project, is of great value for this as well).

Answers to these questions are well outside the scope of a 9-month project. In

addition, these questions deal mostly with the logic of representing networks and

operating on the logical representation. Dynamically enforcing the decisions taken on

the logical level, i.e. translating them into proper device configuration is a whole

different venture, and it was a big part of the NRS project.

NRS CLI and configuration 120

Appendix C. NRS CLI and

configuration This appendix includes parts of the NRS CLI tool documentation, and a sample

configuration file. A more detailed user manual is available at

http://wiki.nikhef.nl/grid/NRS

CLI main usage:

usage: nrs_client.py [-h] [--version]

{status,vif,inter-cloud,allocate,release,reserve,connect}

...

Send action requests to the NRS daemon.

optional arguments:

-h, --help show this help message and exit

--version show program's version number and exit

commands:

{status,vif,inter-cloud,gateway,allocate,release,reserve,connect}

command help:

connect connect a network interface to a network

reserve reserve a network connection to a network interface

allocate allocate a network reservation

release release an interface from a network

vif connect/disconnect a new virtual network interface

inter-cloud start/stop an inter-cloud connection

gateway connect/disconnect to a gateway

status print status information

CLI ‘connect’ command:

usage: nrs_client.py connect [-h] -host NAME -if NAME -nid ID [-vlan ID]

Connect a network interface to a network.

optional arguments:


network interface:

-host NAME, --host NAME

IP resolvable name of the interface's host

-if NAME, --interface NAME

name of the network interface, e.g eth0

http://wiki.nikhef.nl/grid/NRS


network info:

-nid ID, --network_id ID

network identifier for the network that the interface

will be connected to

-vlan ID, --vlan_id ID

VLAN identifier to associate the network with

CLI ‘inter-cloud start’ command:

usage: nrs_client.py inter-cloud start [-h] -addr ADDR -port PORT -type TYPE

-local_id ID -remote_id ID

Start an inter-cloud connection. Will attempt to bridge a local to a remote

network over an inter-cloud connection.

optional arguments:


-addr ADDR, --remote_service_address ADDR

address of the remote service that the inter-cloud

connection will be negotiated with

-port PORT, --remote_service_port PORT

port of the remote service

-type TYPE, --connection_type TYPE

inter-cloud connection type

-local_id ID, --local_network_id ID

local network identifier

-remote_id ID, --remote_network_id ID

remote network identifier

CLI ‘status’ command:

usage: nrs_client.py status [-h]

{reservation,all,vlan,network,inter-cloud} ...

Print status information.

optional arguments:


status arguments:

{reservation,all,vlan,network,inter-cloud}

status help:

all print all status information

reservation print reservation information

vlan print vlan information

network print network information


inter-cloud print inter-cloud information

NRS sample configuration file:

#nrsd.conf [ ] General#Whether to enable inter-cloud functionality.

#default = Trueenable_inter-cloud = True #Whether to enable intra-cloud functionality.

#default = Trueenable_intra-cloud = False #enable_telnet = True #draw_graphs = True #state_directory = . #image_directory = .

#log_directory = .

#Telnet server for administrative access[ ] Telnethost = localhost port = 50011

#main service requests[ ] Main Serviceport = 50008host = 10.80.80.32 [ ] Inter-cloud

#host for inter-cloud communication with remote servicesport = 50009host = 82.180.120.132 #TLS/SSL #priv_key = certs/keyfile #cert = certs/certfile

#ca_certs = certs/ca_certs_file #Default behavior: all networks allowed. #specify if allow or deny has higher priority #rule_priority = allow #If allow networks is not specified, all networks are allowed. #If it is specified, only the specified networks are allowed. allow_networks = 1,2 #Deny inter-cloud to networks. #If not specified, all networks are allowed.

#deny_networks = 1,2,3connection method = VPN [ ] VPN

#the public ip address of the vpn boxvpn-box_address = 82.180.120.133#Whether this NRS can be the VPN server.

#Valid options: must, may, noserver capability = must allow_networks = 3#deny_networks = 1,2,3

123

About the Author

Dimitris Theodorou received his Diploma in Computer

Engineering and Informatics from the University of

Patras, Greece in 2009. During his studies he specialized

in Computer Science and Software Engineering,

developing interests in areas such as algorithm analysis

and implementation, mathematical logic, graph and game

theory. His diploma thesis "Extension of PNYKA e-

voting system to combat malicious attacks" has been

performed in the Greek research institute CTI59

, and

involved extending the e-voting system’s client to

enhance security features.

In 2010, Dimitris joined the Software Technology PDEng

Program in the Technical University of Eindhoven,

stepping into architectural software design and object-

oriented analysis. His final project was performed at

Nikhef and entailed designing and implementing a system

for network provisioning in cloud infrastructures60

. The

system integrates cloud platforms with network hardware

configuration and network topology modeling. Dimitris

received his PDEng degree on October 2012.

59 http://www.cti.gr/en/ , http://www.pnyka.cti.gr/indexEn.php 60 https://wiki.nikhef.nl/grid/NRS

http://www.cti.gr/en/

http://www.pnyka.cti.gr/indexEn.php

https://wiki.nikhef.nl/grid/NRS

network provisioning in iaas clouds - tu/e · pdf filegrid computing community is...

Documents